Suche senden
Hochladen
Do not crawl in the dust different ur ls similar text
•
Als PPT, PDF herunterladen
•
2 gefällt mir
•
927 views
George Ang
Folgen
Melden
Teilen
Melden
Teilen
1 von 46
Jetzt herunterladen
Empfohlen
Wikipedia is now offering up to 7 years of page view data. Can we use this data to measure social engagement ? I gather some data in this test of the cancer drug Tarceva to see what the view data looks like.
Wikipedia Views As A Proxy For Social Engagement
Wikipedia Views As A Proxy For Social Engagement
Daniel Cuneo
Search is everywhere, and therefore so is Apache Lucene. While providing amazing out-of-the-box defaults, there’s enough projects weird enough to require custom search scoring and ranking. In this talk, I’ll walk through how to use Lucene to implement your custom scoring and search ranking. We’ll see how you can achieve both amazing power (and responsibility) over your search results. We’ll see the flexibility of Lucene’s data structures and explore the pros/cons of custom Lucene scoring vs other methods of improving search relevancy.
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
OpenSource Connections
2018/12/10開催のiOS Test Night #9の発表資料です。 https://testnight.connpass.com/event/102778/
よく使うテストヘルパーの紹介 #ios_test_night
よく使うテストヘルパーの紹介 #ios_test_night
Kenji Tanaka
"Data Analysis in Workforce Education and Development" at Penn State
WF ED 540, Class Meeting 2 - Importing & exporting data, 2016
WF ED 540, Class Meeting 2 - Importing & exporting data, 2016
Penn State University
Using Screaming Frog to crawl a website Using R for SEO Analysis Using PaasLogs to centralize logs Using Kibana to build fancy dashboards Tutorial : www.data-seo.com
Analyse your SEO Data with R and Kibana
Analyse your SEO Data with R and Kibana
Vincent Terrasi
This is a very basic introduction about Apache Lucene, that will let you enter the world of search.
Building a Search Engine Using Lucene
Building a Search Engine Using Lucene
Abdelrahman Othman Helal
Presented at Lucene/Solr Revolution 2014
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Lucidworks
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势
George Ang
Empfohlen
Wikipedia is now offering up to 7 years of page view data. Can we use this data to measure social engagement ? I gather some data in this test of the cancer drug Tarceva to see what the view data looks like.
Wikipedia Views As A Proxy For Social Engagement
Wikipedia Views As A Proxy For Social Engagement
Daniel Cuneo
Search is everywhere, and therefore so is Apache Lucene. While providing amazing out-of-the-box defaults, there’s enough projects weird enough to require custom search scoring and ranking. In this talk, I’ll walk through how to use Lucene to implement your custom scoring and search ranking. We’ll see how you can achieve both amazing power (and responsibility) over your search results. We’ll see the flexibility of Lucene’s data structures and explore the pros/cons of custom Lucene scoring vs other methods of improving search relevancy.
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
OpenSource Connections
2018/12/10開催のiOS Test Night #9の発表資料です。 https://testnight.connpass.com/event/102778/
よく使うテストヘルパーの紹介 #ios_test_night
よく使うテストヘルパーの紹介 #ios_test_night
Kenji Tanaka
"Data Analysis in Workforce Education and Development" at Penn State
WF ED 540, Class Meeting 2 - Importing & exporting data, 2016
WF ED 540, Class Meeting 2 - Importing & exporting data, 2016
Penn State University
Using Screaming Frog to crawl a website Using R for SEO Analysis Using PaasLogs to centralize logs Using Kibana to build fancy dashboards Tutorial : www.data-seo.com
Analyse your SEO Data with R and Kibana
Analyse your SEO Data with R and Kibana
Vincent Terrasi
This is a very basic introduction about Apache Lucene, that will let you enter the world of search.
Building a Search Engine Using Lucene
Building a Search Engine Using Lucene
Abdelrahman Othman Helal
Presented at Lucene/Solr Revolution 2014
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Lucidworks
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势
George Ang
大规模数据处理的那些事儿
大规模数据处理的那些事儿
George Ang
腾讯大讲堂21 搜索引擎优化(seo)简介
腾讯大讲堂21 搜索引擎优化(seo)简介
George Ang
手机腾讯网Js资源版本增量更新方案w3ctech
手机腾讯网Js资源版本增量更新方案w3ctech
luyongfugx
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
George Ang
Windows Azure架构探析
Windows Azure架构探析
George Ang
Opinion mining and summarization
Opinion mining and summarization
George Ang
Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...
George Ang
Huffman coding
Huffman coding
George Ang
Responsive design is forcing us to reevaluate our design and development practices. It's also forcing us to rethink how we communicate with our clients and what a project's deliverables might be. Pattern Lab helps bridge the gap by providing one tool that allows for the creation of modular systems as well as gives clients the tool review the work in the place it's going to be used: the browser. This talk is a deep dive into how Pattern Lab is organized and how to take advantage of it.
The Why and What of Pattern Lab
The Why and What of Pattern Lab
Dave Olsen
Presentation given during Inspiring Flow 2013 in Kolbermoor
Using Document Databases with TYPO3 Flow
Using Document Databases with TYPO3 Flow
Karsten Dambekalns
Learn how to apply the test-first approach to all of your Rails projects. In this six class series, experienced Rails engineer and consultant, Wolfram Arnold applies his real-world perspective to teaching you effective patterns for testing. In this fourth of six classes, Wolf covers: - Refactoring code & tests, custom matchers - API Testing - Remote data setup - Cucumber for API testing & documentation ** You can get the video and source code from this presentation at: http://marakana.com/f/204 ** All six classes will be available online, so stay tuned! And be sure to check out marakana.com/techtv for more videos on open source training. Presented by: Wolfram Arnold, in collaboration with Sarah Allen, BlazingCloud.net Produced by: Marakana
Efficient Rails Test Driven Development (class 4) by Wolfram Arnold
Efficient Rails Test Driven Development (class 4) by Wolfram Arnold
Marakana Inc.
Introduction to NoSQL and Hadoop/Cascading for the Atlanta IASA Chapter
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010
Christopher Curtin
Data SLA in the public cloud
Data SLA in the public cloud
Liran Zelkha
Accurately and Reliably Extracting Data from the Web:
Accurately and Reliably Extracting Data from the Web:
butest
No sql
No sql
Shruti_gtbit
1er décembre 2015 Groupe Azure Sujet: Introduction à DocumentDB Conférencier: Vicent-Philippe Lauzon, Microsoft Azure DocumentDB est une base de données de type NoSQL. Lors de cette introduction à DocumentDB, vous verrez: • Ce qu'est une base de données NoSQL • Comment DocumentDB se compare t-il face aux autres base de données Azure • Comment DocumentDB se compare t-il face aux autres base de données NoSQL • Comment créer et gérer une base DocumentDB • Comment l'utiliser (outils + C#) • Sécurité • Performance / Capacité Vincent-Philippe Lauzon est un Microsoft Azure Solution Architect & Machine Learning / Consultant Sénior chez CGI. Vous pouvez lire son blog http://vincentlauzon.com et le suivre sur Twitter https://twitter.com/vplauzon
Introduction à DocumentDB
Introduction à DocumentDB
MSDEVMTL
Prototype Javascript
Prototype Utility Methods(1)
Prototype Utility Methods(1)
mussawir20
Presentation as given to the Haystack Conference, which outlines research and techniques for automatic extraction of keywords, concepts, and vocabularies from text corpora.
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Max Irwin
Oracle Performance by Design - presnetaiotn given at Oracle Open World 2009
Performance By Design
Performance By Design
Guy Harrison
Chris Westin's talk from MongoSF (May 2011) on MongoDB's coming aggregation framework.
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011
Chris Westin
This session starts by giving an overview of components of an Alfresco content model. We then examine the various forms of call-backs and hook-points available to the developer and give some examples of how these can be used to enforce custom business logic and model consistency.
Content Modeling Behavior
Content Modeling Behavior
Alfresco Software
Slides given as part of a talk for the Atlanta Perl Mongers, December 2, 2010.
Practical catalyst
Practical catalyst
dwm042
Weitere ähnliche Inhalte
Andere mochten auch
大规模数据处理的那些事儿
大规模数据处理的那些事儿
George Ang
腾讯大讲堂21 搜索引擎优化(seo)简介
腾讯大讲堂21 搜索引擎优化(seo)简介
George Ang
手机腾讯网Js资源版本增量更新方案w3ctech
手机腾讯网Js资源版本增量更新方案w3ctech
luyongfugx
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
George Ang
Windows Azure架构探析
Windows Azure架构探析
George Ang
Opinion mining and summarization
Opinion mining and summarization
George Ang
Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...
George Ang
Huffman coding
Huffman coding
George Ang
Andere mochten auch
(8)
大规模数据处理的那些事儿
大规模数据处理的那些事儿
腾讯大讲堂21 搜索引擎优化(seo)简介
腾讯大讲堂21 搜索引擎优化(seo)简介
手机腾讯网Js资源版本增量更新方案w3ctech
手机腾讯网Js资源版本增量更新方案w3ctech
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
Windows Azure架构探析
Windows Azure架构探析
Opinion mining and summarization
Opinion mining and summarization
Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...
Huffman coding
Huffman coding
Ähnlich wie Do not crawl in the dust different ur ls similar text
Responsive design is forcing us to reevaluate our design and development practices. It's also forcing us to rethink how we communicate with our clients and what a project's deliverables might be. Pattern Lab helps bridge the gap by providing one tool that allows for the creation of modular systems as well as gives clients the tool review the work in the place it's going to be used: the browser. This talk is a deep dive into how Pattern Lab is organized and how to take advantage of it.
The Why and What of Pattern Lab
The Why and What of Pattern Lab
Dave Olsen
Presentation given during Inspiring Flow 2013 in Kolbermoor
Using Document Databases with TYPO3 Flow
Using Document Databases with TYPO3 Flow
Karsten Dambekalns
Learn how to apply the test-first approach to all of your Rails projects. In this six class series, experienced Rails engineer and consultant, Wolfram Arnold applies his real-world perspective to teaching you effective patterns for testing. In this fourth of six classes, Wolf covers: - Refactoring code & tests, custom matchers - API Testing - Remote data setup - Cucumber for API testing & documentation ** You can get the video and source code from this presentation at: http://marakana.com/f/204 ** All six classes will be available online, so stay tuned! And be sure to check out marakana.com/techtv for more videos on open source training. Presented by: Wolfram Arnold, in collaboration with Sarah Allen, BlazingCloud.net Produced by: Marakana
Efficient Rails Test Driven Development (class 4) by Wolfram Arnold
Efficient Rails Test Driven Development (class 4) by Wolfram Arnold
Marakana Inc.
Introduction to NoSQL and Hadoop/Cascading for the Atlanta IASA Chapter
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010
Christopher Curtin
Data SLA in the public cloud
Data SLA in the public cloud
Liran Zelkha
Accurately and Reliably Extracting Data from the Web:
Accurately and Reliably Extracting Data from the Web:
butest
No sql
No sql
Shruti_gtbit
1er décembre 2015 Groupe Azure Sujet: Introduction à DocumentDB Conférencier: Vicent-Philippe Lauzon, Microsoft Azure DocumentDB est une base de données de type NoSQL. Lors de cette introduction à DocumentDB, vous verrez: • Ce qu'est une base de données NoSQL • Comment DocumentDB se compare t-il face aux autres base de données Azure • Comment DocumentDB se compare t-il face aux autres base de données NoSQL • Comment créer et gérer une base DocumentDB • Comment l'utiliser (outils + C#) • Sécurité • Performance / Capacité Vincent-Philippe Lauzon est un Microsoft Azure Solution Architect & Machine Learning / Consultant Sénior chez CGI. Vous pouvez lire son blog http://vincentlauzon.com et le suivre sur Twitter https://twitter.com/vplauzon
Introduction à DocumentDB
Introduction à DocumentDB
MSDEVMTL
Prototype Javascript
Prototype Utility Methods(1)
Prototype Utility Methods(1)
mussawir20
Presentation as given to the Haystack Conference, which outlines research and techniques for automatic extraction of keywords, concepts, and vocabularies from text corpora.
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Max Irwin
Oracle Performance by Design - presnetaiotn given at Oracle Open World 2009
Performance By Design
Performance By Design
Guy Harrison
Chris Westin's talk from MongoSF (May 2011) on MongoDB's coming aggregation framework.
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011
Chris Westin
This session starts by giving an overview of components of an Alfresco content model. We then examine the various forms of call-backs and hook-points available to the developer and give some examples of how these can be used to enforce custom business logic and model consistency.
Content Modeling Behavior
Content Modeling Behavior
Alfresco Software
Slides given as part of a talk for the Atlanta Perl Mongers, December 2, 2010.
Practical catalyst
Practical catalyst
dwm042
Slides of my presentations at PyData NYC. This PDF is extracted from a Jupyter RISE slideset available at http://nbviewer.ipython.org/format/slides/github/lechatpito/PyDataNYC2015/blob/master/Word%20embeddings%20as%20a%20service%20-%20PyData%20NYC%202015%20%20.ipynb#/
Word embeddings as a service - PyData NYC 2015
Word embeddings as a service - PyData NYC 2015
François Scharffe
I presented at the Atlanta Java User's Group in July 2009 about Hadoop and Cascading.
Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009
Christopher Curtin
Ensures the product meets the end-user’s expectations and demonstrates the intended use. A streamlined process means fewer errors and delays. This framework decreases confusion, simplifies the release process, and increases efficiency GIVEN a product’s specified behaviors WHEN the product meets the end-user’s expectation AND demonstrates the intended use THEN the product has been validated
QA Automation Behavioral Driven Validation
QA Automation Behavioral Driven Validation
Price Charlot
Presentation by Scott Peckham on Day 2, June 25 at the EarthCube All-Hands Meeting
AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...
AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...
EarthCube
Everyone wants to jump into HTML5 but how do you use the cool features of this new specification while ensuring existing and future browsers render your web pages as expected? This is where feature detection, Modernizr, polyfills and shims come in. In the session, you’ll learn the best practices and strategy to code with HTML5 and CSS3 features that won’t break for the existing and future browsers. You’ll learn step by step how to use specially crafted JavaScript and CSS code that emulate HTML5 features. Also, a real-life case study will be used to demonstrate step by step how to build Cross-Browser Plug-in-Free experiences. With a couple of simple changes to your sites, you can take advantage of HTML5 today without breaking your sites in the future. Expect a lot of demos and code in the session.
Practical HTML5: Using It Today
Practical HTML5: Using It Today
Doris Chen
Neo, wake up! SOA has you! :) A complete accademic overview about the Web Oriented Architecture. A comparison between WOA and SOA is well described. What is ReST and why it is so important for the WOA. A proxy ReST-to-SOAP, based on Oracle Service Bus, is explained. Which products WOA lovers are searching for? This presentation has some "sponsored slides" from Oracle.
Woa. Reloaded
Woa. Reloaded
Emiliano Pecis
Ähnlich wie Do not crawl in the dust different ur ls similar text
(20)
The Why and What of Pattern Lab
The Why and What of Pattern Lab
Using Document Databases with TYPO3 Flow
Using Document Databases with TYPO3 Flow
Efficient Rails Test Driven Development (class 4) by Wolfram Arnold
Efficient Rails Test Driven Development (class 4) by Wolfram Arnold
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010
Data SLA in the public cloud
Data SLA in the public cloud
Accurately and Reliably Extracting Data from the Web:
Accurately and Reliably Extracting Data from the Web:
No sql
No sql
Introduction à DocumentDB
Introduction à DocumentDB
Prototype Utility Methods(1)
Prototype Utility Methods(1)
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Performance By Design
Performance By Design
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011
Content Modeling Behavior
Content Modeling Behavior
Practical catalyst
Practical catalyst
Word embeddings as a service - PyData NYC 2015
Word embeddings as a service - PyData NYC 2015
Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009
QA Automation Behavioral Driven Validation
QA Automation Behavioral Driven Validation
AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...
AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...
Practical HTML5: Using It Today
Practical HTML5: Using It Today
Woa. Reloaded
Woa. Reloaded
Mehr von George Ang
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程
George Ang
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq
George Ang
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道
George Ang
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化
George Ang
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间
George Ang
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨
George Ang
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站
George Ang
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程
George Ang
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement
George Ang
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享
George Ang
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍
George Ang
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
George Ang
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享
George Ang
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
George Ang
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
George Ang
腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向
George Ang
腾讯大讲堂13 soso访问速度优化
腾讯大讲堂13 soso访问速度优化
George Ang
腾讯大讲堂24 qq show2.0重构历程
腾讯大讲堂24 qq show2.0重构历程
George Ang
腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍
George Ang
腾讯大讲堂26 带宽优化之道
腾讯大讲堂26 带宽优化之道
George Ang
Mehr von George Ang
(20)
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向
腾讯大讲堂13 soso访问速度优化
腾讯大讲堂13 soso访问速度优化
腾讯大讲堂24 qq show2.0重构历程
腾讯大讲堂24 qq show2.0重构历程
腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂26 带宽优化之道
腾讯大讲堂26 带宽优化之道
Do not crawl in the dust different ur ls similar text
1.
Do Not Crawl
In The DUST: Different URLs Similar Text Uri Schonfeld Department of Electrical Engineering Technion Joint Work with Dr. Ziv Bar Yossef and Dr. Idit Keidar
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
Envelopes Example
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
Precision at k
30.
Precision vs. Validation
31.
32.
33.
34.
THE END
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
Jetzt herunterladen