SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Search Engine Back End API
Solution for Fast Prototyping
https://github.com/jimmylai/
search_engine_backend_template

Introduction and Tutorial
Jimmy Lai
r97922028 [at] ntu.edu.tw
http://tw.linkedin.com/pub/jimmy-lai/27/4a/536
2013/01/23
Requirements:
•  A Backend API provides:
1. 
2. 
3. 
4. 
5. 

Full-text search
Spatial search
RESTful API
Input validation
Output rendering

•  Simple and easy for fast prototyping

LDSP

2
LDSP:
Linux + Django + Solr + Python
•  Django: API serving
–  Input validation
–  Data mash-up
–  Output rendering

•  Solr: Indexing and searching
–  Schema
–  Indexing
–  Query
LDSP

3
Automation:
Shell command + Fabric
•  Installation:
–  Solr
–  Python packages: django, djangorestframework,
pysolr

•  Start services:
–  Solr
–  Django standalone server

•  Data:
–  Create Solr core
–  Feed data
LDSP

4
[Tutorial] Requirement
•  Example Dataset: Taiwan movie dataset
http://data.gov.tw/node/7731

•  Data source
http://nrchbms.culture.tw/OpenData/API/iCultureAPI.aspx?
type=26&radius=100&format=json

•  We’d like to build a geo search API to
find nearby movie items given a
location

LDSP

5
[Tutorial] Data preprocessing (1/2)
•  Original data
•  {"sno":"29277","mainTitle":"「新南光」電影
《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的
外型遊台灣","year":"未
知","format":"508x640pixels","subject":"戲劇
電影歌仔戲","date":"創作⽇日期:⺠民國時期⺠民國
時期","sourcesite":"http://nrch.cca.gov.tw/
ccahome/photo/photo_meta.jsp?
xml_id=0005780574&dofile=cca100009-hptmtp0281_00-0001w.jpg","px":"120.49","py":"23.853"}
LDSP

6
[Tutorial] Data preprocessing (2/2)
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 

Preprocessed data: normalized the location format
The preprocessed data is stored as data/movie.json
{
"format": "508x640pixels",
"sno": "29277",
"sourcesite": "http://nrch.cca.gov.tw/ccahome/photo/photo_meta.jsp?
xml_id=0005780574&dofile=cca100009-hp-tmtp0281_00-0001-w.jpg",
"mainTitle": "「新南光」電影《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的
外型遊台灣",
"location": "23.853,120.49",
"year": "未知",
"date": "創作⽇日期:⺠民國時期⺠民國時期",
"id": "29277",
"subject": "戲劇電影歌仔戲"
},

LDSP

7
[Tutorial] Data indexing
• 
• 
• 
• 
• 

pip install fabric
fab setup_env
fab create_core:movie
fab run_solr
fab feed_data:movie,data/movie.json
–  Core name as the 1st parameter, data file name
as the 2nd parameter

•  fab run_django
•  We are done!
LDSP

8
[Tutorial] Solr Admin

LDSP

9
[Tutorial] Djangorestframework UI

LDSP

10
[Tutorial] BE API Query, json output

LDSP

11
[Tutorial] BE API Query, xml output

LDSP

12
The next steps
•  Understand the mechanism of:
–  Solr:
•  Update schema
•  Update partial data
•  Query language of Solr

–  Djangorestframework:
•  Add more parameters
•  Input validation
•  Output rendering

•  Deploy to production environment
LDSP

13

Weitere ähnliche Inhalte

Was ist angesagt?

Python WSGI introduction
Python WSGI introductionPython WSGI introduction
Python WSGI introductionAgeeleshwar K
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Big Data Spain
 
Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And GroovyKen Kousen
 
Kube-AWS
Kube-AWSKube-AWS
Kube-AWSCoreOS
 
Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018Mathieu Herbert
 
CoreOS in a Nutshell
CoreOS in a NutshellCoreOS in a Nutshell
CoreOS in a NutshellCoreOS
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSematext Group, Inc.
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabricandymccurdy
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 
Transforming Infrastructure into Code - Importing existing cloud resources u...
Transforming Infrastructure into Code  - Importing existing cloud resources u...Transforming Infrastructure into Code  - Importing existing cloud resources u...
Transforming Infrastructure into Code - Importing existing cloud resources u...Shih Oon Liong
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulkTeguh Nugraha
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsm_richardson
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...Yevgeniy Brikman
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKYoungHeon (Roy) Kim
 
Altitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshopAltitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshopFastly
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverFastly
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종NAVER D2
 

Was ist angesagt? (20)

Python WSGI introduction
Python WSGI introductionPython WSGI introduction
Python WSGI introduction
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
 
Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And Groovy
 
Kube-AWS
Kube-AWSKube-AWS
Kube-AWS
 
Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018
 
CoreOS in a Nutshell
CoreOS in a NutshellCoreOS in a Nutshell
CoreOS in a Nutshell
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabric
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Transforming Infrastructure into Code - Importing existing cloud resources u...
Transforming Infrastructure into Code  - Importing existing cloud resources u...Transforming Infrastructure into Code  - Importing existing cloud resources u...
Transforming Infrastructure into Code - Importing existing cloud resources u...
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulk
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systems
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
 
Altitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshopAltitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshop
 
Embuk internals
Embuk internalsEmbuk internals
Embuk internals
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
 
Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
 

Andere mochten auch

Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
 
Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookFast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookJimmy Lai
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr UsageJimmy Lai
 
Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst NanodegreeJimmy Lai
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012Jimmy Lai
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHugDocumentation with sphinx @ PyHug
Documentation with sphinx @ PyHugJimmy Lai
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in pythonJimmy Lai
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012Jimmy Lai
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesJimmy Lai
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge BaseJimmy Lai
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHugJimmy Lai
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugJimmy Lai
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learnJimmy Lai
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 

Andere mochten auch (15)

Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookFast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython Notebook
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr Usage
 
Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst Nanodegree
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHugDocumentation with sphinx @ PyHug
Documentation with sphinx @ PyHug
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in python
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languages
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge Base
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 

Ähnlich wie [LDSP] Search Engine Back End API Solution for Fast Prototyping

rest3d Web3D 2014
rest3d Web3D 2014rest3d Web3D 2014
rest3d Web3D 2014Remi Arnaud
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1Jungsu Heo
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonSmartBear
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comJungsu Heo
 
Barcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot FrameworkBarcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot FrameworkSomkiat Puisungnoen
 
Rapid application development with spring roo j-fall 2010 - baris dere
Rapid application development with spring roo   j-fall 2010 - baris dereRapid application development with spring roo   j-fall 2010 - baris dere
Rapid application development with spring roo j-fall 2010 - baris dereBaris Dere
 
マイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCマイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCdisc99_
 
release_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdfrelease_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdfPaul Yang
 
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk""Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"Rinaldi Rampen
 
Apache Solr - search for everyone!
Apache Solr - search for everyone!Apache Solr - search for everyone!
Apache Solr - search for everyone!Jaran Flaath
 
Apex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DXApex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DXtzm_freedom
 
Inside Of Mbga Open Platform
Inside Of Mbga Open PlatformInside Of Mbga Open Platform
Inside Of Mbga Open PlatformHideo Kimura
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platformLINE Corporation
 
Icinga 2009 at OSMC
Icinga 2009 at OSMCIcinga 2009 at OSMC
Icinga 2009 at OSMCIcinga
 
Contributing to rails
Contributing to railsContributing to rails
Contributing to railsLukas Eppler
 
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...SPS Paris
 

Ähnlich wie [LDSP] Search Engine Back End API Solution for Fast Prototyping (20)

rest3d Web3D 2014
rest3d Web3D 2014rest3d Web3D 2014
rest3d Web3D 2014
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & Python
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Barcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot FrameworkBarcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot Framework
 
Angular2 inter3
Angular2 inter3Angular2 inter3
Angular2 inter3
 
Norikra Recent Updates
Norikra Recent UpdatesNorikra Recent Updates
Norikra Recent Updates
 
Rapid application development with spring roo j-fall 2010 - baris dere
Rapid application development with spring roo   j-fall 2010 - baris dereRapid application development with spring roo   j-fall 2010 - baris dere
Rapid application development with spring roo j-fall 2010 - baris dere
 
マイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCマイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPC
 
release_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdfrelease_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdf
 
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk""Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
 
Apache Solr - search for everyone!
Apache Solr - search for everyone!Apache Solr - search for everyone!
Apache Solr - search for everyone!
 
Apex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DXApex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DX
 
JRubyConf 2009
JRubyConf 2009JRubyConf 2009
JRubyConf 2009
 
Inside Of Mbga Open Platform
Inside Of Mbga Open PlatformInside Of Mbga Open Platform
Inside Of Mbga Open Platform
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platform
 
Icinga 2009 at OSMC
Icinga 2009 at OSMCIcinga 2009 at OSMC
Icinga 2009 at OSMC
 
Planidoo & Zotonic
Planidoo & ZotonicPlanidoo & Zotonic
Planidoo & Zotonic
 
Contributing to rails
Contributing to railsContributing to rails
Contributing to rails
 
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

[LDSP] Search Engine Back End API Solution for Fast Prototyping

  • 1. Search Engine Back End API Solution for Fast Prototyping https://github.com/jimmylai/ search_engine_backend_template Introduction and Tutorial Jimmy Lai r97922028 [at] ntu.edu.tw http://tw.linkedin.com/pub/jimmy-lai/27/4a/536 2013/01/23
  • 2. Requirements: •  A Backend API provides: 1.  2.  3.  4.  5.  Full-text search Spatial search RESTful API Input validation Output rendering •  Simple and easy for fast prototyping LDSP 2
  • 3. LDSP: Linux + Django + Solr + Python •  Django: API serving –  Input validation –  Data mash-up –  Output rendering •  Solr: Indexing and searching –  Schema –  Indexing –  Query LDSP 3
  • 4. Automation: Shell command + Fabric •  Installation: –  Solr –  Python packages: django, djangorestframework, pysolr •  Start services: –  Solr –  Django standalone server •  Data: –  Create Solr core –  Feed data LDSP 4
  • 5. [Tutorial] Requirement •  Example Dataset: Taiwan movie dataset http://data.gov.tw/node/7731 •  Data source http://nrchbms.culture.tw/OpenData/API/iCultureAPI.aspx? type=26&radius=100&format=json •  We’d like to build a geo search API to find nearby movie items given a location LDSP 5
  • 6. [Tutorial] Data preprocessing (1/2) •  Original data •  {"sno":"29277","mainTitle":"「新南光」電影 《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的 外型遊台灣","year":"未 知","format":"508x640pixels","subject":"戲劇 電影歌仔戲","date":"創作⽇日期:⺠民國時期⺠民國 時期","sourcesite":"http://nrch.cca.gov.tw/ ccahome/photo/photo_meta.jsp? xml_id=0005780574&dofile=cca100009-hptmtp0281_00-0001w.jpg","px":"120.49","py":"23.853"} LDSP 6
  • 7. [Tutorial] Data preprocessing (2/2) •  •  •  •  •  •  •  •  •  •  •  •  •  Preprocessed data: normalized the location format The preprocessed data is stored as data/movie.json { "format": "508x640pixels", "sno": "29277", "sourcesite": "http://nrch.cca.gov.tw/ccahome/photo/photo_meta.jsp? xml_id=0005780574&dofile=cca100009-hp-tmtp0281_00-0001-w.jpg", "mainTitle": "「新南光」電影《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的 外型遊台灣", "location": "23.853,120.49", "year": "未知", "date": "創作⽇日期:⺠民國時期⺠民國時期", "id": "29277", "subject": "戲劇電影歌仔戲" }, LDSP 7
  • 8. [Tutorial] Data indexing •  •  •  •  •  pip install fabric fab setup_env fab create_core:movie fab run_solr fab feed_data:movie,data/movie.json –  Core name as the 1st parameter, data file name as the 2nd parameter •  fab run_django •  We are done! LDSP 8
  • 11. [Tutorial] BE API Query, json output LDSP 11
  • 12. [Tutorial] BE API Query, xml output LDSP 12
  • 13. The next steps •  Understand the mechanism of: –  Solr: •  Update schema •  Update partial data •  Query language of Solr –  Djangorestframework: •  Add more parameters •  Input validation •  Output rendering •  Deploy to production environment LDSP 13