SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Catmandu
What is it?
• a Perl library	

• a command line tool	

• to import, transform and export (library)
data	


• in a pragmatic way	

• can handle large streams of data

Where do i find it?
• http://librecat.org/	

• https://github.com/LibreCat	

• http://search.cpan.org/search?
query=Catmandu
Show of hands
• programming?	

• json?	

• command line user?
Show me
$ catmandu convert JSON to YAML
!
$ catmandu convert JSON
--file /path/to/file.yaml
to YAML
--file /path/to/file.json
--fix 'capitalize("title")'
--fix 'trim("abstract")'
Show me
$ catmandu import MARC
--file /path/to/records.xml
--type MARCXML
to MongoDB
--database-name catalogue
--bag records
--verbose
Show me
$ catmandu import MARC
--file /path/to/records.xml
--type MARCXML
to MongoDB
--database-name catalogue
--bag records
--verbose
--fix "marc_map('245','title')"
--fix "marc_map('100','authors.$append')"
--fix "marc_map('008/35-35','language')"
Commands
$ catmandu convert
convert data from one file format into another!
!
$ catmandu import
import data from a file into a store!
!
$ catmandu export
export data from a store into a file!
!
$ catmandu move
copy data from a store into another store!
!
$ catmandu count
count the number of objects in a store!
!
$ catmandu delete
delete objects from a store
Commands

$ catmandu repl
In Perl
use Catmandu;
!
my $importer = Catmandu->importer('CSV',
fields => ['person_id', 'name']);
!
my $bag = Catmandu->store('ElasticSearch',
index_name => "myapp")->bag("people");
!
my $exporter = Catmandu->exporter('JSON', file => $out);
!
$bag->add_many($importer);
$bag->add({person_id => "123", name => "mr. jones"});
$bag->commit;
!
$exporter->add_many($bag);
In Perl
use Catmandu;
!
my $importer = Catmandu->importer('CSV',
fields => ['person_id', 'name']);
!
my $fixer = Catmandu->fixer([
'/path/to/fix/file.txt',
'capitalize("name")',
]);
!
$importer = $fixer->fix($importer);
!
$importer->each(sub {
my $person = shift;
say $person->{"name"};
});
Fix file example
add_field('my.deeply.nested.field', "value");
add_field('my.list.$append', "value");
!
remove_field('my.list.3');
remove_field('my.list.$last');
!
if_exists('my.key');
cmd('python transform.py');
end();
Internal data model
• plain data, no objects	

• basically everything that is representable as
JSON




{title => "my title",

authors => [

{name => "mr. jones"},

{name => "mr. smith"}],

weight => 1.73,

}
Main Catmandu parts

• Catmandu	

• Catmandu::Importer
• Catmandu::Exporter
• Catmandu::Store
• Catmandu::Bag
• Catmandu::Hits
• Catmandu::Fix


(Iterable)

	


(Addable, Fixable)	


(Addable, Fixable, Iterable)	


(Addable, Fixable, Iterable[, Searchable])	

(Iterable)	


Catmandu::Fix::Base

Catmandu::Fix::Condition
Importers
•
•
•
•
•
•
•
•

Atom	

CSV	

JSON	

YAML	

MARC	

MAB	

ArXiv	

CrossRef	


•
•
•
•
•
•
•
•

LDAP	

OAI	

PLoS	

PubMed	

SRU	

ORCID	

Z39.50	

Inspire
Importers
•
•

MediaMosa	

AlephX
Stores
•
•
•
•
•
•
•

DBI	

MongoDB	

ElasticSearch	

Solr	

FedoraCommons	

CouchDB	

Hash
Exporters
•
•
•
•
•
•
•
•

Atom	

BibTeX	

CSV	

JSON	

RIS	

Template	

XLS	

YAML	


•
•
•

MARCXML	

RTF	

ODS
Fixes
•
•
•
•
•
•
•

add_field	

append	

capitalize	

clone	

collapse	

copy_field	

downcase	


•
•
•
•
•
•

expand	

join_field	

move_field	

nothing	

prepend	

remove_field
Fixes
•
•
•
•
•
•
•

replace_all	

retain_field	

set_field	

split_field	

substring	

trim	

upcase	


•
•
•
•
•
•
•

marc_map	

marc_in_json	

marc_xml	

mab_map	

mab_in_json	

mab_xml	

cmd
Fixes
•
•
•
•
•

sum	

lookup	

lookup_in_store	

to_json	

from_json
Fixes (conditionals)
•
•
•
•
•
•

if_all_match	

unless_all_match	

if_any_match	

unless_any_match	

if_exists	

unless_exists	


•
•

otherwise	

end

Weitere ähnliche Inhalte

Was ist angesagt?

Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Startit
 

Was ist angesagt? (20)

Gemification plan of Standard Library on Ruby
Gemification plan of Standard Library on RubyGemification plan of Standard Library on Ruby
Gemification plan of Standard Library on Ruby
 
Using Logstash, elasticsearch & kibana
Using Logstash, elasticsearch & kibanaUsing Logstash, elasticsearch & kibana
Using Logstash, elasticsearch & kibana
 
How to Begin to Develop Ruby Core
How to Begin to Develop Ruby CoreHow to Begin to Develop Ruby Core
How to Begin to Develop Ruby Core
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Logstash: Get to know your logs
Logstash: Get to know your logsLogstash: Get to know your logs
Logstash: Get to know your logs
 
Logstash-Elasticsearch-Kibana
Logstash-Elasticsearch-KibanaLogstash-Elasticsearch-Kibana
Logstash-Elasticsearch-Kibana
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Replacing ActiveRecord With DataMapper
Replacing ActiveRecord With DataMapperReplacing ActiveRecord With DataMapper
Replacing ActiveRecord With DataMapper
 
Practical ngx_mruby
Practical ngx_mrubyPractical ngx_mruby
Practical ngx_mruby
 
How to Begin Developing Ruby Core
How to Begin Developing Ruby CoreHow to Begin Developing Ruby Core
How to Begin Developing Ruby Core
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
 
Practical Testing of Ruby Core
Practical Testing of Ruby CorePractical Testing of Ruby Core
Practical Testing of Ruby Core
 
Fluentd and AWS at classmethod
Fluentd and AWS at classmethodFluentd and AWS at classmethod
Fluentd and AWS at classmethod
 
Presto Bangalore Meetup1 Presto Raptor@ola
Presto Bangalore Meetup1 Presto Raptor@olaPresto Bangalore Meetup1 Presto Raptor@ola
Presto Bangalore Meetup1 Presto Raptor@ola
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
 
RejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rbRejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rb
 
How to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdataHow to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdata
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014
 
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and Python
 
(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석
 

Andere mochten auch

Resume-Douglas A Martin
Resume-Douglas A MartinResume-Douglas A Martin
Resume-Douglas A Martin
Douglas Martin
 
Ishounkina internet research-projects
Ishounkina internet research-projectsIshounkina internet research-projects
Ishounkina internet research-projects
Media Gorod
 

Andere mochten auch (13)

差分の差分法(Difference-in-Difference)
差分の差分法(Difference-in-Difference)差分の差分法(Difference-in-Difference)
差分の差分法(Difference-in-Difference)
 
韓国の地域主義は乗り越えられるかー選挙公約の役割の実証分析ー
韓国の地域主義は乗り越えられるかー選挙公約の役割の実証分析ー韓国の地域主義は乗り越えられるかー選挙公約の役割の実証分析ー
韓国の地域主義は乗り越えられるかー選挙公約の役割の実証分析ー
 
Jenkins - Job dsl plugin
Jenkins - Job dsl pluginJenkins - Job dsl plugin
Jenkins - Job dsl plugin
 
Gadadasu saikiran
Gadadasu saikiranGadadasu saikiran
Gadadasu saikiran
 
Rosario Degree
Rosario DegreeRosario Degree
Rosario Degree
 
2. sejarah ti(1)
2. sejarah ti(1)2. sejarah ti(1)
2. sejarah ti(1)
 
Resume-Douglas A Martin
Resume-Douglas A MartinResume-Douglas A Martin
Resume-Douglas A Martin
 
Differences-in-Differences
Differences-in-DifferencesDifferences-in-Differences
Differences-in-Differences
 
Ishounkina internet research-projects
Ishounkina internet research-projectsIshounkina internet research-projects
Ishounkina internet research-projects
 
Switching
SwitchingSwitching
Switching
 
Atribuição Multicanal: Qual o retorno das mídias sociais? - Social Analytics ...
Atribuição Multicanal: Qual o retorno das mídias sociais? - Social Analytics ...Atribuição Multicanal: Qual o retorno das mídias sociais? - Social Analytics ...
Atribuição Multicanal: Qual o retorno das mídias sociais? - Social Analytics ...
 
誰が選挙公報を見るのか - 無党派性と政治的有効性感覚に着目した日韓比較 (修正版)
誰が選挙公報を見るのか - 無党派性と政治的有効性感覚に着目した日韓比較 (修正版)誰が選挙公報を見るのか - 無党派性と政治的有効性感覚に着目した日韓比較 (修正版)
誰が選挙公報を見るのか - 無党派性と政治的有効性感覚に着目した日韓比較 (修正版)
 
Mekong Forum 2013 Opening remarks andrew campbell
Mekong Forum 2013 Opening remarks andrew campbellMekong Forum 2013 Opening remarks andrew campbell
Mekong Forum 2013 Opening remarks andrew campbell
 

Ähnlich wie Catmandu presentation at SWIB 2013

Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
confluent
 

Ähnlich wie Catmandu presentation at SWIB 2013 (20)

LibreCat::Catmandu
LibreCat::CatmanduLibreCat::Catmandu
LibreCat::Catmandu
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Synapse india reviews on php website development
Synapse india reviews on php website developmentSynapse india reviews on php website development
Synapse india reviews on php website development
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache Camel
 
Building Go Web Apps
Building Go Web AppsBuilding Go Web Apps
Building Go Web Apps
 
Apache Camel - The integration library
Apache Camel - The integration libraryApache Camel - The integration library
Apache Camel - The integration library
 
Ws rest
Ws restWs rest
Ws rest
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs Pig
 
Apache Camel Introduction & What's in the box
Apache Camel Introduction & What's in the boxApache Camel Introduction & What's in the box
Apache Camel Introduction & What's in the box
 
Mashups with Drupal and QueryPath
Mashups with Drupal and QueryPathMashups with Drupal and QueryPath
Mashups with Drupal and QueryPath
 
Python redis talk
Python redis talkPython redis talk
Python redis talk
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Using and scaling Rack and Rack-based middleware
Using and scaling Rack and Rack-based middlewareUsing and scaling Rack and Rack-based middleware
Using and scaling Rack and Rack-based middleware
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
WordPress Café: Using WordPress as a Framework
WordPress Café: Using WordPress as a FrameworkWordPress Café: Using WordPress as a Framework
WordPress Café: Using WordPress as a Framework
 
Taming NoSQL with Spring Data
Taming NoSQL with Spring DataTaming NoSQL with Spring Data
Taming NoSQL with Spring Data
 
Html5 Brown Bag
Html5 Brown BagHtml5 Brown Bag
Html5 Brown Bag
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Catmandu presentation at SWIB 2013