SlideShare ist ein Scribd-Unternehmen logo
1 von 85
MontySolr:
Embedding CPython in Solr
          Roman Chyla, CERN
   roman.chyla@cern.ch, May 26, 2011
Why should I care?
- Our challenge is to connect Python and Java
- Without compromises
- We created MontySolr extension
   -   Robust, tested (will be used by our system)
   -   But works for any Python application (eg. Django)
   -   And for any C/C++ app that Python understands!
   -   Open source (GPL v2)
- Try it out!
   - https://github.com/romanchyla/montysolr




                                                           2
Outline

‣ Context
- The Challenge
- Key components
  - Available technologies
  - Our approach
  - Problems solved
- Evaluation
- Wrap-up



                             3
CERN
- European Organization for Nuclear Research
  - Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide




                                                   4
CERN
- European Organization for Nuclear Research
  - Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide




                                                   4
CERN
- European Organization for Nuclear Research
  - Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide




                                                   4
CERN
- European Organization for Nuclear Research
  - Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide




                                                   4
CERN
- European Organization for Nuclear Research
  - Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide




                                                   4
CERN
- European Organization for Nuclear Research
  - Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide




                                                   4
CERN
- European Organization for Nuclear Research
  - Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide




                                                   4
CERN
- European Organization for Nuclear Research
  - Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide




                                                   4
SPIRES
- Stanford Linear Accelerator Center - SLAC
- High-Energy Physics Literature Database
- Started December 1991
  - The first web outside Europe/CERN
  - The first database on web




                                              5
SPIRES
- Stanford Linear Accelerator Center - SLAC
- High-Energy Physics Literature Database
- Started December 1991
  - The first web outside Europe/CERN
  - The first database on web




                                              5
6
7
Invenio
- Integrated digital library software behind INSPIRE
- Used by very large institutional repositories
   - http://repositories.webometrics.info/toprep_inst.asp
- Customizable virtual collections
- Flexible management of metadata
   - 3 000 authors per article
- Powerful search engine
   - Incl. citation map analysis
- Written in Python (since 2001)
   - 290 000 lines of code


                                                            8
Outline

- Context
‣ The Challenge
- Key components
  - Available technologies
  - Our approach
  - Problems solved
- Evaluation
- Wrap-up



                             9
The Challenge
- HEP scientific community
   - Searches metadata oriented
- However fulltexts are changing the situation
- And we want to provide even better service
   - Bigger volumes of data
   - NLP processing
   - Semantic search




                                                 10
The Challenge




  Invenio




                11
The Challenge

  Query: supersymmetry AND author:ellis



  Invenio




                                          11
The Challenge

  Query: supersymmetry AND author:ellis



  Invenio        fulltext:supersymmetry




                                          11
The Challenge

  Query: supersymmetry AND author:ellis



  Invenio        fulltext:supersymmetry


                 IDs: 1;2;3;9....




                                          11
The Challenge

  Query: supersymmetry AND author:ellis



  Invenio        fulltext:supersymmetry


                 IDs: 1;2;3;9....




                                          11
The Challenge

  Query: supersymmetry AND author:ellis



  Invenio        fulltext:supersymmetry


                 IDs: 1;2;3;9....




                                          11
The Challenge

  Query: supersymmetry AND author:ellis



  Invenio        fulltext:supersymmetry


                 IDs: 1;2;3;9....




                                          11
The Challenge

  Query: supersymmetry AND author:ellis



  Invenio        fulltext:supersymmetry
                       1-6M IDs

                 IDs: 1;2;3;9....




                                          11
The Challenge

  Query: supersymmetry AND author:ellis



  Invenio        fulltext:supersymmetry
                       1-6M IDs

                 IDs: 1;2;3;9....

                        1. only IDs,
                        no score
                        = no ranking


                                          11
The Challenge

  Query: supersymmetry AND author:ellis



  Invenio        fulltext:supersymmetry
                       1-6M IDs

                   IDs: 1;2;3;9....

2. score merging          1. only IDs,
difficult (if             no score
available)                = no ranking


                                          11
The Challenge
                                3. push IDs ?
                                (eg._faceting)
  Query: supersymmetry AND author:ellis



  Invenio       fulltext:supersymmetry
                      1-6M IDs

                   IDs: 1;2;3;9....

2. score merging          1. only IDs,
difficult (if             no score
available)                = no ranking


                                                 11
What is the “best” solution?
- We love Python...
- ...and our applications are written in Python...

- But what if Solr is the master search engine?
- Merge results inside Solr?
   - Typical size: 1-10 mil. IDs
   - Expected latency: 1-2 s.
- What we want to achieve:
   - Fast transfer of hits from Invenio to Solr
   - Leverage the power of both (no compromises)
   - Developer-friendly integration, simplicity
- Additional concerns:                               12
Outline

- Context
- The Challenge
‣ Key components
  - Available technologies
  - Our approach
  - Evaluation
- Demonstration
- Wrap-up



                             13
To embed Solr (in Java app)

- Your app simulates Java web container?
  - use EmbeddedSolrServer
- It knows nothing about Java servlets?
  - use DirectConnect class
- Maybe we are too lazy?
  - Embed the web container (in my case Jetty)
  - Seemed strange (webserver inside webserver)
  - ... but it worked well




                                                  14
To embed Solr (in Java app)

- Your app simulates Java web container?
  - use EmbeddedSolrServer
- It knows nothing about Java servlets?
  - use DirectConnect class
- Maybe we are too lazy?
  - Embed the web container (in my case Jetty)
  - Seemed strange (webserver inside webserver)
  - ... but it worked well




                                                  14
To embed Solr (in Java app)

- Your app simulates Java web container?
  - use EmbeddedSolrServer
- It knows nothing about Java servlets?
  - use DirectConnect class
- Maybe we are too lazy?
  - Embed the web container (in my case Jetty)
  - Seemed strange (webserver inside webserver)
  - ... but it worked well




                                                  14
To embed Solr (in Java app)

- Your app simulates Java web container?
  - use EmbeddedSolrServer
- It knows nothing about Java servlets?
  - use DirectConnect class
- Maybe we are too lazy?
  - Embed the web container (in my case Jetty)
  - Seemed strange (webserver inside webserver)
  - ... but it worked well




                                                  14
To embed Solr (in Java app)

- Your app simulates Java web container?
  - use EmbeddedSolrServer
- It knows nothing about Java servlets?
  - use DirectConnect class
- Maybe we are too lazy?
  - Embed the web container (in my case Jetty)
  - Seemed strange (webserver inside webserver)
  - ... but it worked well




                                                  14
To use Solr in non-Java app
- Solr is already usable via HTTP requests, but we
  need something else here...
- Remote objects/calls?
  - Pyro, execnet, CORBA, SOAP...
  - or simply pipes?
- Access Python from Java?
  - Jython
  - JEPP
- Access Java from Python?
  - JPype
  - JCC

                                                     15
Jython?
- Implementation of Python in 100% Java
- Both Java and Python code
- Truly multithreaded



- C modules will not work
  - but see http://bit.ly/iTRYbb
- Slower than CPython




                                          16
Jython?
- Implementation of Python in 100% Java
- Both Java and Python code
- Truly multithreaded



- C modules will not work
  - but see http://bit.ly/iTRYbb
- Slower than CPython




                                          17
Jython?
- Implementation of Python in 100% Java
- Both Java and Python code
- Truly multithreaded



- C modules will not work
  - but see http://bit.ly/iTRYbb
- Slower than CPython




                                          17
JEPP - Java Embedded Python
- Python code runs inside
  Python interpreter
- Embeds CPython interpreter
  via Java Native Interface
  (JNI) in Java
- http://jepp.sourceforge.net/
  - recently updated (27-Jan)
  - but JCC is more active




                                 18
JEPP - Java Embedded Python




                              19
JCC
- Embeds JVM in Python
- C++ code generator
- C++ object interface
  wraps a Java library
- C++ wrappers conform
  to Python's C type
  system
- result: complete Python
  extension module



                            20
JCC




      21
JCC




      21
JCC




      21
To use Solr in non-Java app

              Jython   JCC    JEPP

Python                 ✓       ✓
CModules
Speed                  ✓       ?

No code                ✓       ✓
changes
Access from     ✓      ✓
Python
Access from     ✓      ...     ✓
Java
                                     22
The first try


                       Invenio


                Solr




                       JCC



                                 23
Devil is in details...




                         24
GIL - Global Interpreter Lock

    Unfortunately Python webapp is not like Java...




                                                      25
GIL - Global Interpreter Lock




We can have 200 threads, but only 4 will run at time...
                                                          26
GIL - Global Interpreter Lock




                                27
Fortunately solution exists
- JCC can embed Python inside Java
   - Special thanks to Andi Vajda! (JCC creator)
- We write ‘empty’ classes in Java ...
- ... and implement them in Python




  Python /w Java inside            Java /w Python inside   28
The second try

                       Solr /w Invenio
    Invenio              (backend)
   frontend

                 XML




                                    JCC


                                          29
Implementing the bridge
- Special Java class
- With method pythonExtension()
- Native method pythonDecRef()
  - JCC provides its implementation
- And number of other native methods
  - These will be implemented using Python
- Like writing JNI Java/C code but without
  compilation...




                                             30
MontySolr extension
- JCC has great potential, but also added
  complexity...
- So the MontySolr project was born
  - Modules must be built in shared mode
  - JCC dynamic library loaded and started from the main
    thread
  - Simple mechanism of the Python bridge and message
  - Configurable handlers on the Python side
  - Secured dereferencing of the native objects
  - Threading on the Java side
  - Multiprocessing on the Python side
  - Easy ant targets (compilation) ...
                                                           31
Hello World - Java part
public class MontySolrBridge extends BasicBridge implements
PythonBridge {
	   private long pythonObject;
	   public void pythonExtension(long pythonObject) {
	   	   this.pythonObject = pythonObject;
	   }
	   public long pythonExtension() {
	   	   return this.pythonObject;
	   }
	   public void finalize() throws Throwable {
	   	   pythonDecRef();
	   }
	   public native void pythonDecRef();
	   public void sendMessage(PythonMessage message) {
	   	   PythonVM vm = PythonVM.get();
	   	   vm.acquireThreadState();
	   	   receive_message(message);
	   	   vm.releaseThreadState();
	   }
	   public native void receive_message(PythonMessage message);
}                                                                32
Hello World - Python part

from montysolr import MontySolrBridge

class SimpleBridge(MontySolrBridge):

    def __init__(self):
        super(SimpleBridge, self).__init__()

    def receive_message(self, message):
        query = message.getParam(‘query’)
        message.setResults(‘Hello world!’)
        print ‘Python received from Java:’, query




                                                    33
Example - running MontySolr
- Java side
  - JRE (32/64 bit)
  - Standard Solr/Lucene jars
  - JCC dynamic library
- Python side
  - Python interpreter (32/64 bit)
  - 4 Python modules (jcc, solr, lucene, montysolr)
- In the main thread
  - First we load JCC
  - Then start Python interpreter ...
  - ... load Python handlers

                                                      34
Solr as search service

                         Solr /w Invenio
    Invenio                (backend)
   frontend

                XML




                                      JCC


                                            35
Example

             Solr




  MyCustom
   Handler




                    36
Example
 refersto:author:ellis
                         Solr




  MyCustom
   Handler




                                37
Example - Solr custom handler

	   MontySolrVM.INSTANCE.sendMessage(message);
	
	   PythonMessage msg = MontySolrVM.INSTANCE
	   	 .createMessage("perform_search")
	   	 .setSender("Invenio")
	   	 .setParam("query","refersto:author:ellis");

	   MontySolrVM.INSTANCE.sendMessage(msg);
	   Object result = msg.getResults();
	   if (result != null) {
	   	 int[] hits = (int[]) message.getResults();
	
	   }


                                                    38
Example - JNI connection
 refersto:author:ellis
                                  Solr




  MyCustom               Python
   Handler               Bridge




                                         39
Example - JNI connection
 refersto:author:ellis
                                           Solr




  MyCustom               Python Invenio
   Handler               Bridge wrappers




                                                  40
Example - Python side

    # handler is made ‘visible’ at startup
    SolrpieTarget('Invenio:perform_search',
         perform_search)



    # search time - called from Java
    def perform_search(message):
        query = message.getParam(“query”)
        hits = call_real_search(query)
        # cast Python list into Java array
        message.setResults(JArray_ints(hits))




                                                41
Example
 refersto:author:ellis
                                                Solr

                                           Invenio


                                           Invenio
  MyCustom               Python Invenio
   Handler               Bridge wrappers
                                           Invenio


                                           Invenio



                                                       42
Example - Java side again

    MontySolrVM.INSTANCE.sendMessage(message);
	   	
	   PythonMessage msg = MontySolrVM.INSTANCE
	   	 .createMessage("perform_search")
	   	 .setSender("Invenio")
	   	 .setParam("query","refersto:author:ellis");

	   MontySolrVM.INSTANCE.sendMessage(msg);
	   Object result = msg.getResults();
	   if (result != null) {
	   	 int[] hits = (int[]) message.getResults();
	
	   }


                                                    43
Solr as search service

                         Solr /w Invenio
   Apache                  (backend)
  webserver

                XML

                          Invenio
   Invenio


                                           JCC


                                                 44
Outline

- Context
- The Challenge
- Key components
  - Available technologies
  - Our approach
  - Problems solved
‣ Evaluation
- Wrap-up



                             45
Memory and garbage collection




                                46
Comparing speed and load...




                              47
The effect of cache




                      48
Robust?
- Extensive siege tests show very good
  performance and stability under high load
   - 100-200 users, complex searches
   - 50 concurrent users, citation analysis
   - JCC incurs small overhead
- We detected no memory leaks
   - The same as dbpedia.org
- But watch out for errors in C
   - An error in C module brings down the whole JVM
   - (errors in pure Python module can be handled)



                                                      49
Easy to develop/maintain?
- Added complexity
  - Java in the toolbox
  - Need to compile C++ extensions
  - Python/OS version dependencies


- For this we get
  -   Easy integration with Invenio
  -   The best of two applications
  -   A lot of features for free
  -   And we can control Solr from Python!



                                             50
Outline

- Context
- The Challenge
- Key components
  - Available technologies
  - Our approach
  - Problems solved
- Evaluation
‣ Wrap-up



                             51
Wrap-up
- Our challenge was to connect two different
  languages/systems
- And we wanted to get the best of the two...
   - So we had to plug Python into Solr
   - And now our Solr knows citation analysis!
- We created MontySolr extension
   -   Robust, tested (will be used by INSPIRE)
   -   Works for any Python application (eg. Django)
   -   And for any C/C++ app that Python understands!
   -   Free software license
- Try it out! Help us make it better!
   - https://github.com/romanchyla/montysolr
                                                        52
Questions?
- MontySolr
  - https://github.com/romanchyla/montysolr

- Roman Chyla
  -   Fellow, CERN Scientific Information Service
  -   roman.chyla@cern.ch
  -   @rchyla
  -   https://svnweb.cern.ch/trac/rcarepo
Additional information




                         54
Links
- Invenio platform
   - http://invenio-software.org/
- INSPIRE Digital library
   - http://inspirebeta.net/
- Diagrams of JCC and JEPP
   - Andreas Schreiber : Mixing Java and Python
   - http://www.slideshare.net/onyame/mixing-python-and-
     java
- On Jython C Extension API
   - http://stackoverflow.com/questions/3097466/using-
     numpy-and-cpython-with-jython
- Demo of a running service:
   - http://insdev01.cern.ch                               55
#1 - How to embed Solr (standard)
- solr.client.solrj.embedded.EmbeddedSolrServer




                                                  56
#2 - How to embed Solr (simplified)
- solr.servlet.DirectSolrConnection
- like previous, but simpler
- all the queries are sent as strings, everything is
  just a string
- very flexible and probably suitable for quick
  integration




                                                       57
#2 - How to embed Solr (simplified)
- solr.servlet.DirectSolrConnection
- like previous, but simpler
- all the queries are sent as strings, everything is
  just a string
- very flexible and probably suitable for quick
  integration




                                                       57
#3 - Example of a Solr custom handler




                                        58
#4 - Example Python handler




                              59

Weitere ähnliche Inhalte

Ähnlich wie Lucene revolutionmontysolr 2011_presentation

Ep keyote slides
Ep  keyote slidesEp  keyote slides
Ep keyote slidesOpenEBS
 
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...CodeOps Technologies LLP
 
Lucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesLucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesCharlie Hull
 
LavaJUG-Maven 3.x, will it lives up to its promises
LavaJUG-Maven 3.x, will it lives up to its promisesLavaJUG-Maven 3.x, will it lives up to its promises
LavaJUG-Maven 3.x, will it lives up to its promisesArnaud Héritier
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Os Leventhal
Os LeventhalOs Leventhal
Os Leventhaloscon2007
 
REST::Neo4p - Talk @ DC Perl Mongers
REST::Neo4p - Talk @ DC Perl MongersREST::Neo4p - Talk @ DC Perl Mongers
REST::Neo4p - Talk @ DC Perl MongersMark Jensen
 
Performance #5 cpu and battery
Performance #5  cpu and batteryPerformance #5  cpu and battery
Performance #5 cpu and batteryVitali Pekelis
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos EngineeringSIGHUP
 
The End of the world as we know it - AKA your last NullPointerException $1B b...
The End of the world as we know it - AKA your last NullPointerException $1B b...The End of the world as we know it - AKA your last NullPointerException $1B b...
The End of the world as we know it - AKA your last NullPointerException $1B b...Michael Vorburger
 
The Long Walk to Apache NetBeans
The Long Walk to Apache NetBeansThe Long Walk to Apache NetBeans
The Long Walk to Apache NetBeansGeertjan Wielenga
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Alexander Korbonits
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsDataWorks Summit
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
Cpython embedded in solr - By Roman Chyla
Cpython embedded in solr - By Roman Chyla Cpython embedded in solr - By Roman Chyla
Cpython embedded in solr - By Roman Chyla lucenerevolution
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with KerasQuantUniversity
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
 

Ähnlich wie Lucene revolutionmontysolr 2011_presentation (20)

Ep keyote slides
Ep  keyote slidesEp  keyote slides
Ep keyote slides
 
Ep keyote slides
Ep  keyote slidesEp  keyote slides
Ep keyote slides
 
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
 
Lucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesLucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challenges
 
LavaJUG-Maven 3.x, will it lives up to its promises
LavaJUG-Maven 3.x, will it lives up to its promisesLavaJUG-Maven 3.x, will it lives up to its promises
LavaJUG-Maven 3.x, will it lives up to its promises
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Os Leventhal
Os LeventhalOs Leventhal
Os Leventhal
 
REST::Neo4p - Talk @ DC Perl Mongers
REST::Neo4p - Talk @ DC Perl MongersREST::Neo4p - Talk @ DC Perl Mongers
REST::Neo4p - Talk @ DC Perl Mongers
 
Performance #5 cpu and battery
Performance #5  cpu and batteryPerformance #5  cpu and battery
Performance #5 cpu and battery
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
 
The End of the world as we know it - AKA your last NullPointerException $1B b...
The End of the world as we know it - AKA your last NullPointerException $1B b...The End of the world as we know it - AKA your last NullPointerException $1B b...
The End of the world as we know it - AKA your last NullPointerException $1B b...
 
The Long Walk to Apache NetBeans
The Long Walk to Apache NetBeansThe Long Walk to Apache NetBeans
The Long Walk to Apache NetBeans
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Cpython embedded in solr - By Roman Chyla
Cpython embedded in solr - By Roman Chyla Cpython embedded in solr - By Roman Chyla
Cpython embedded in solr - By Roman Chyla
 
Embedding CPython in Solr
Embedding CPython in SolrEmbedding CPython in Solr
Embedding CPython in Solr
 
Deep Learning with Spark
Deep Learning with SparkDeep Learning with Spark
Deep Learning with Spark
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...
 

Mehr von lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Kürzlich hochgeladen

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Lucene revolutionmontysolr 2011_presentation

  • 1. MontySolr: Embedding CPython in Solr Roman Chyla, CERN roman.chyla@cern.ch, May 26, 2011
  • 2. Why should I care? - Our challenge is to connect Python and Java - Without compromises - We created MontySolr extension - Robust, tested (will be used by our system) - But works for any Python application (eg. Django) - And for any C/C++ app that Python understands! - Open source (GPL v2) - Try it out! - https://github.com/romanchyla/montysolr 2
  • 3. Outline ‣ Context - The Challenge - Key components - Available technologies - Our approach - Problems solved - Evaluation - Wrap-up 3
  • 4. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4
  • 5. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4
  • 6. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4
  • 7. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4
  • 8. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4
  • 9. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4
  • 10. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4
  • 11. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4
  • 12. SPIRES - Stanford Linear Accelerator Center - SLAC - High-Energy Physics Literature Database - Started December 1991 - The first web outside Europe/CERN - The first database on web 5
  • 13. SPIRES - Stanford Linear Accelerator Center - SLAC - High-Energy Physics Literature Database - Started December 1991 - The first web outside Europe/CERN - The first database on web 5
  • 14. 6
  • 15. 7
  • 16. Invenio - Integrated digital library software behind INSPIRE - Used by very large institutional repositories - http://repositories.webometrics.info/toprep_inst.asp - Customizable virtual collections - Flexible management of metadata - 3 000 authors per article - Powerful search engine - Incl. citation map analysis - Written in Python (since 2001) - 290 000 lines of code 8
  • 17. Outline - Context ‣ The Challenge - Key components - Available technologies - Our approach - Problems solved - Evaluation - Wrap-up 9
  • 18. The Challenge - HEP scientific community - Searches metadata oriented - However fulltexts are changing the situation - And we want to provide even better service - Bigger volumes of data - NLP processing - Semantic search 10
  • 19. The Challenge Invenio 11
  • 20. The Challenge Query: supersymmetry AND author:ellis Invenio 11
  • 21. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 11
  • 22. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11
  • 23. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11
  • 24. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11
  • 25. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11
  • 26. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 11
  • 27. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 1. only IDs, no score = no ranking 11
  • 28. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 2. score merging 1. only IDs, difficult (if no score available) = no ranking 11
  • 29. The Challenge 3. push IDs ? (eg._faceting) Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 2. score merging 1. only IDs, difficult (if no score available) = no ranking 11
  • 30. What is the “best” solution? - We love Python... - ...and our applications are written in Python... - But what if Solr is the master search engine? - Merge results inside Solr? - Typical size: 1-10 mil. IDs - Expected latency: 1-2 s. - What we want to achieve: - Fast transfer of hits from Invenio to Solr - Leverage the power of both (no compromises) - Developer-friendly integration, simplicity - Additional concerns: 12
  • 31. Outline - Context - The Challenge ‣ Key components - Available technologies - Our approach - Evaluation - Demonstration - Wrap-up 13
  • 32. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14
  • 33. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14
  • 34. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14
  • 35. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14
  • 36. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14
  • 37. To use Solr in non-Java app - Solr is already usable via HTTP requests, but we need something else here... - Remote objects/calls? - Pyro, execnet, CORBA, SOAP... - or simply pipes? - Access Python from Java? - Jython - JEPP - Access Java from Python? - JPype - JCC 15
  • 38. Jython? - Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded - C modules will not work - but see http://bit.ly/iTRYbb - Slower than CPython 16
  • 39. Jython? - Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded - C modules will not work - but see http://bit.ly/iTRYbb - Slower than CPython 17
  • 40. Jython? - Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded - C modules will not work - but see http://bit.ly/iTRYbb - Slower than CPython 17
  • 41. JEPP - Java Embedded Python - Python code runs inside Python interpreter - Embeds CPython interpreter via Java Native Interface (JNI) in Java - http://jepp.sourceforge.net/ - recently updated (27-Jan) - but JCC is more active 18
  • 42. JEPP - Java Embedded Python 19
  • 43. JCC - Embeds JVM in Python - C++ code generator - C++ object interface wraps a Java library - C++ wrappers conform to Python's C type system - result: complete Python extension module 20
  • 44. JCC 21
  • 45. JCC 21
  • 46. JCC 21
  • 47. To use Solr in non-Java app Jython JCC JEPP Python ✓ ✓ CModules Speed ✓ ? No code ✓ ✓ changes Access from ✓ ✓ Python Access from ✓ ... ✓ Java 22
  • 48. The first try Invenio Solr JCC 23
  • 49. Devil is in details... 24
  • 50. GIL - Global Interpreter Lock Unfortunately Python webapp is not like Java... 25
  • 51. GIL - Global Interpreter Lock We can have 200 threads, but only 4 will run at time... 26
  • 52. GIL - Global Interpreter Lock 27
  • 53. Fortunately solution exists - JCC can embed Python inside Java - Special thanks to Andi Vajda! (JCC creator) - We write ‘empty’ classes in Java ... - ... and implement them in Python Python /w Java inside Java /w Python inside 28
  • 54. The second try Solr /w Invenio Invenio (backend) frontend XML JCC 29
  • 55. Implementing the bridge - Special Java class - With method pythonExtension() - Native method pythonDecRef() - JCC provides its implementation - And number of other native methods - These will be implemented using Python - Like writing JNI Java/C code but without compilation... 30
  • 56. MontySolr extension - JCC has great potential, but also added complexity... - So the MontySolr project was born - Modules must be built in shared mode - JCC dynamic library loaded and started from the main thread - Simple mechanism of the Python bridge and message - Configurable handlers on the Python side - Secured dereferencing of the native objects - Threading on the Java side - Multiprocessing on the Python side - Easy ant targets (compilation) ... 31
  • 57. Hello World - Java part public class MontySolrBridge extends BasicBridge implements PythonBridge { private long pythonObject; public void pythonExtension(long pythonObject) { this.pythonObject = pythonObject; } public long pythonExtension() { return this.pythonObject; } public void finalize() throws Throwable { pythonDecRef(); } public native void pythonDecRef(); public void sendMessage(PythonMessage message) { PythonVM vm = PythonVM.get(); vm.acquireThreadState(); receive_message(message); vm.releaseThreadState(); } public native void receive_message(PythonMessage message); } 32
  • 58. Hello World - Python part from montysolr import MontySolrBridge class SimpleBridge(MontySolrBridge): def __init__(self): super(SimpleBridge, self).__init__() def receive_message(self, message): query = message.getParam(‘query’) message.setResults(‘Hello world!’) print ‘Python received from Java:’, query 33
  • 59. Example - running MontySolr - Java side - JRE (32/64 bit) - Standard Solr/Lucene jars - JCC dynamic library - Python side - Python interpreter (32/64 bit) - 4 Python modules (jcc, solr, lucene, montysolr) - In the main thread - First we load JCC - Then start Python interpreter ... - ... load Python handlers 34
  • 60. Solr as search service Solr /w Invenio Invenio (backend) frontend XML JCC 35
  • 61. Example Solr MyCustom Handler 36
  • 62. Example refersto:author:ellis Solr MyCustom Handler 37
  • 63. Example - Solr custom handler MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis"); MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); } 38
  • 64. Example - JNI connection refersto:author:ellis Solr MyCustom Python Handler Bridge 39
  • 65. Example - JNI connection refersto:author:ellis Solr MyCustom Python Invenio Handler Bridge wrappers 40
  • 66. Example - Python side # handler is made ‘visible’ at startup SolrpieTarget('Invenio:perform_search', perform_search) # search time - called from Java def perform_search(message): query = message.getParam(“query”) hits = call_real_search(query) # cast Python list into Java array message.setResults(JArray_ints(hits)) 41
  • 67. Example refersto:author:ellis Solr Invenio Invenio MyCustom Python Invenio Handler Bridge wrappers Invenio Invenio 42
  • 68. Example - Java side again MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis"); MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); } 43
  • 69. Solr as search service Solr /w Invenio Apache (backend) webserver XML Invenio Invenio JCC 44
  • 70. Outline - Context - The Challenge - Key components - Available technologies - Our approach - Problems solved ‣ Evaluation - Wrap-up 45
  • 71. Memory and garbage collection 46
  • 72. Comparing speed and load... 47
  • 73. The effect of cache 48
  • 74. Robust? - Extensive siege tests show very good performance and stability under high load - 100-200 users, complex searches - 50 concurrent users, citation analysis - JCC incurs small overhead - We detected no memory leaks - The same as dbpedia.org - But watch out for errors in C - An error in C module brings down the whole JVM - (errors in pure Python module can be handled) 49
  • 75. Easy to develop/maintain? - Added complexity - Java in the toolbox - Need to compile C++ extensions - Python/OS version dependencies - For this we get - Easy integration with Invenio - The best of two applications - A lot of features for free - And we can control Solr from Python! 50
  • 76. Outline - Context - The Challenge - Key components - Available technologies - Our approach - Problems solved - Evaluation ‣ Wrap-up 51
  • 77. Wrap-up - Our challenge was to connect two different languages/systems - And we wanted to get the best of the two... - So we had to plug Python into Solr - And now our Solr knows citation analysis! - We created MontySolr extension - Robust, tested (will be used by INSPIRE) - Works for any Python application (eg. Django) - And for any C/C++ app that Python understands! - Free software license - Try it out! Help us make it better! - https://github.com/romanchyla/montysolr 52
  • 78. Questions? - MontySolr - https://github.com/romanchyla/montysolr - Roman Chyla - Fellow, CERN Scientific Information Service - roman.chyla@cern.ch - @rchyla - https://svnweb.cern.ch/trac/rcarepo
  • 80. Links - Invenio platform - http://invenio-software.org/ - INSPIRE Digital library - http://inspirebeta.net/ - Diagrams of JCC and JEPP - Andreas Schreiber : Mixing Java and Python - http://www.slideshare.net/onyame/mixing-python-and- java - On Jython C Extension API - http://stackoverflow.com/questions/3097466/using- numpy-and-cpython-with-jython - Demo of a running service: - http://insdev01.cern.ch 55
  • 81. #1 - How to embed Solr (standard) - solr.client.solrj.embedded.EmbeddedSolrServer 56
  • 82. #2 - How to embed Solr (simplified) - solr.servlet.DirectSolrConnection - like previous, but simpler - all the queries are sent as strings, everything is just a string - very flexible and probably suitable for quick integration 57
  • 83. #2 - How to embed Solr (simplified) - solr.servlet.DirectSolrConnection - like previous, but simpler - all the queries are sent as strings, everything is just a string - very flexible and probably suitable for quick integration 57
  • 84. #3 - Example of a Solr custom handler 58
  • 85. #4 - Example Python handler 59

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. mention the transition/collaboration: cern-desy-fermilab-slac\n
  14. paradigm of a full result set\n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. Python: fast-prototyping, easy for students (who write a lot of the code)\n
  28. \n
  29. X - not spend time on the code\n“I was waiting for the point ‘this is the solution’” :)\n\nad #1\n solr.client.solrj.embedded.EmbeddedSolrServer\n Solr is running as an embedded process, not inside a servlet container\n the default/recommended way\nad #2\n solr.servlet.DirectSolrConnect\n like previous, but simpler\n all the queries are sent as strings, everything is just a string\n very flexible and probably suitable for quick integration\n
  30. X - not spend time on the code\n“I was waiting for the point ‘this is the solution’” :)\n\nad #1\n solr.client.solrj.embedded.EmbeddedSolrServer\n Solr is running as an embedded process, not inside a servlet container\n the default/recommended way\nad #2\n solr.servlet.DirectSolrConnect\n like previous, but simpler\n all the queries are sent as strings, everything is just a string\n very flexible and probably suitable for quick integration\n
  31. X - not spend time on the code\n“I was waiting for the point ‘this is the solution’” :)\n\nad #1\n solr.client.solrj.embedded.EmbeddedSolrServer\n Solr is running as an embedded process, not inside a servlet container\n the default/recommended way\nad #2\n solr.servlet.DirectSolrConnect\n like previous, but simpler\n all the queries are sent as strings, everything is just a string\n very flexible and probably suitable for quick integration\n
  32. X - not spend time on the code\n“I was waiting for the point ‘this is the solution’” :)\n\nad #1\n solr.client.solrj.embedded.EmbeddedSolrServer\n Solr is running as an embedded process, not inside a servlet container\n the default/recommended way\nad #2\n solr.servlet.DirectSolrConnect\n like previous, but simpler\n all the queries are sent as strings, everything is just a string\n very flexible and probably suitable for quick integration\n
  33. I don’t mention some options like writing JNI ourselves or using intermediaries other than remote objects (eg. shared memory, if that would be possible)\n
  34. everybody thinks Jython, right? No!\n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. These are only some important features, omitted is simplicity and beauty (JEPP eval is just ugly way of doing things), documentation, community, support etc.\n
  42. \n
  43. \n
  44. Make sure that it is clear that processes can have threads - here it is not clear what is process and what is thread (it is not visible)\n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. Truly bi-directional\n We can call Python functions and pass Java objects\n From inside Python we can call Java object/methods\n
  56. \n
  57. the real-code example is in appendix #3\n
  58. \n
  59. \n
  60. the real code is in appendix #4\n
  61. note: don’t forget to mention how the multiprocessing is saving memory on the linux systems (due to the read-write and forking). This is effectively an alternative to Python WSGI that cannot run multiprocessing. We show that it is possible to use multiprocessing effectively.\n
  62. the real code is in appendix #3\n\n
  63. \n
  64. more precise - montysolr intro (include)\n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. TODO:\nInvnenio is the same as Django\nToday, Solr can now do 2nd order operations\n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n