SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
A linked open data architecture for contemporary
historical archives
Alexandre Rademaker1 Suemi Higuchi2
DÂŽario Augusto B. Oliveira2
IBM Research and FGV/EMAp
FGV/CPDOC
September 25, 2013
Getulio Vargas Foundation (FGV)
Brazilian higher education and
research institution founded in
December 20, 1944. It oïŹ€ers regular
courses of Economics, Business
Administration, Law, Social Sciences
and Applied Mathematics. Its
original goal was to train people for
the country’s public- and
private-sector management. It is
considered by Foreign Policy
magazine to be a top-5 policymaker
think-tank worldwide.
http://portal.fgv.br
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 2 / 22
CPDOC - Center of Brazilian Contemporary History
A major center for teaching and researching in the Social Sciences
and Contemporary History located in Rio de Janeiro. It holds:
Personal Archives (Acessus) ≈ 200 archives, up to 1,8M docs or
5.2M pages (700K digitalized), among text (handwritten and
printed), letters, memos, diaries, images and videos.
Oral History Program (PHO) A huge set of testimonies (in audio
and video) consisting of more than 2K interviews, which correspond
to up to 6K hours of recordings. 90% in digital format. Only 10% is
transcribed. Limit access, not online.
Brazilian Historical Biographic Dictionary (DHBB) 7,5K entries,
6,5K are of biographical and 1K related to institutions, events and
concepts of interest for the Brazilian history after 1930. Carefully
revised entries by researchers. Few metadata.
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 3 / 22
Currently Architecture
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 4 / 22
Currently Relational DB
89 tables/classes and 660 columns/properties.
TIPO_ARQUIVO
PK CD_TIPO_ARQUIVO
NM_TIPO_ARQUIVO
PO_Pasta
PK IDPasta
Tipo
Descricao
DataCriacao
UltimaModificacao
FK1 IDUsuario
ENTREVISTADO
PK CD_ENTREVISTADO
NM_ENTREVISTADO
NM_SOBRENOME_ENTREVISTADO
NM_NACIONALIDADE_ENTREVISTADO
CD_EST_CIVIL_ENTREVISTADO
NM_PROFISS_ENTREVISTADO
CD_CPF_ENTREVISTADO
CD_RG_ENTREVISTADO
CD_ORG_EMISS_ENTREVISTADO
NM_LOGR_RESID_ENTREVISTADO
NM_BAIR_RESID_ENTREVISTADO
NM_CID_RESID_ENTREVISTADO
SG_UF_RESID_ENTREVISTADO
NM_PAIS_RESID_ENTREVISTADO
CD_CEP_RESID_ENTREVISTADO
CD_TEL_RESID_ENTREVISTADO
CD_CELULAR_RESID_ENTREVISTADO
CD_EMAIL_RESID_ENTREVISTADO
NM_LOGR_COMERC_ENTREVISTADO
NM_BAIR_COMERC_ENTREVISTADO
NM_CID_COMERC_ENTREVISTADO
SG_UF_COMERC_ENTREVISTADO
NM_PAIS_COMERC_ENTREVISTADO
CD_CEP_COMERC_ENTREVISTADO
CD_TEL_COMERC_ENTREVISTADO
CD_CELULAR_COMERC_ENTREVISTADO
CD_EMAIL_COMERC_ENTREVISTADO
NM_CONTATO_ENTREVISTADO
DS_QLFCAO_CONTATO_ENTREVISTADO
CD_TEL_CONTATO_ENTREVISTADO
CD_CELULAR_CONTATO_ENTREVISTADO
CD_EMAIL_CONTATO_ENTREVISTADO
DT_NASC_ENTREVISTADO
FK2 NM_LOC_NASC_ENTREVISTADO
DT_FALEC_ENTREVISTADO
FK1 NM_LOCAL_FALEC_ENTREVISTADO
DS_ATIVIDADE
DS_FORMACAO
DS_OBSERVACAO
NM_COMPLETO_ENTREVISTADO_PESQ
DH_VERBETE
PK CD_VRB
CD_TP_VRB
IN_SIT_EDICAO_VRB
NM_VRB
NM_PESQ_VRB
CD_VRB_ORIGINAL
DS_LEAD_VRB
DS_OBS_VRB
DT_ATU_VRB
CD_LOGIN_USUSIS
DS_CONTEUDO
cd_vti
NM_CONHECIDO_VRB
IN_PUBLICADO
FK1 CD_UNIDADE_DOCUMENTAL
CONDICAO_ACESSO_ENTREVISTA
PK CD_CONDICAO_ACESSO_ENTREVISTA
FK2 CD_ENTREVISTA
FK1 CD_CONDICAO_ACESSO
IN_LIBERADO
AC_RESPONSABILIDADE
PK CD_RES
DS_RES
DT_ATU_RES
CD_LOGIN_USUSIS
AC_MANUSCRITO
CD_MAN
CD_CLASSIFICACAO_MAN
NR_DOCUMENTOS_MAN
IN_DOCUMENTO_TIPO_MAN
NR_DOCUMENTOS_TIPO_MAN
DS_PERIODO_PRODUCAO_MAN
NR_ANO_PRODUCAO_DE_MAN
NR_ANO_PRODUCAO_ATE_MAN
CD_PDA
CD_MICROFILME_MAN
DS_RESUMO_MAN
DS_NOTAS_MAN
DT_ATU_MAN
CD_LOGIN_USUSIS
FK1 CD_UNIDADE_DOCUMENTAL_MANUSCRITO
AC_INSTITUICAO
PK CD_INS
U1 DS_INS
DT_ATU_INS
CD_LOGIN_USUSIS
AC_ARQUIVO_UNIDADE_DOCUMENTAL
PK,FK2 CD_UNIDADE_DOCUMENTAL
PK,FK1 CD_ARQUIVO
NR_SEQUENCIA
TIPO_SUMARIO
PK CD_TIPO_SUMARIO
DS_TIPO_SUMARIO
PALAVRA_NAO_CAPITALIZAVEL
PALAVRA
AC_TITULACAO
PK SG_TIT
DS_TIT
NR_ORDEM
AC_DOADOR_FUNDO
PK CD_DOADOR
PK,FK1 SG_FUN
AC_DESCRITOR_UNIDADE_DOCUMENTAL
PK,FK1 CD_UNIDADE_DOCUMENTAL
PK CD_DEL
AC_COLECAO_UNIDADE_DOCUMENTAL
PK CD_COLECAO
PK,FK1 CD_UNIDADE_DOCUMENTAL
NR_SEQUENCIA
PARAMETRO_CONSULTA
NM_PARAMETRO
CD_NOTREE_PARAMETRO
DS_CONTEUDO_PARAMETRO
IN_REL_VISIVEL_PARAMETRO
IN_USU_COMUM_PARAMETRO
DS_EXPLICACAO_PARAMETRO
INSTITUICAO
PK CD_INSTITUICAO
U1 NM_RAZ_SOC_INSTITUICAO
NM_LOGR_INSTITUICAO
NM_BAIR_INSTITUICAO
NM_CID_INSTITUICAO
SG_UF_INSTITUICAO
NM_PAIS_INSTITUICAO
CD_CEP_INSTITUICAO
CD_TEL_INSTITUICAO
CD_FAX_INSTITUICAO
CD_EMAIL_INSTITUICAO
DH_CARGO
PK CD_CARGO
DS_CARGO
DS_ABREV_CARGO
CD_CARGO_TP
DT_ATU_CARGO
CD_LOGIN_USUSIS
AC_EXEMPLAR_PERIODICO
PK,FK2 CD_UNIDADE_DOCUMENTAL_EXEMPLAR
CD_EPR
CD_PRI
CD_VOLUME_EPR
CD_NUMERO_EPR
DS_DATA_PUBLICACAO_EPR
NR_ANO_PUBLICACAO_DE_EPR
NR_ANO_PUBLICACAO_ATE_EPR
DS_NOTAS_EPR
DT_ATU_EPR
CD_LOGIN_USUSIS
FK1 CD_UNIDADE_DOCUMENTAL_PERIODICO
AC_DOADOR
PK CD_DOADOR
NM_DOADOR
NM_CONJUGE_DOA
DS_PRINCIPAIS_ATIVIDADES_DOA
DS_NOTAS_DOA
IN_CONSELHO_DOADORES
DT_ATU_DOA
CD_LOGIN_USUSIS
CD_SERVICO
CD_EXT_SERVCLI
IN_FALECIDO
DIA_FALECIMENTO
MES_FALECIMENTO
ANO_FALECIMENTO
AC_AUTORIDADE
PK CD_AUT
NM_AUT
DT_ATU_AUT
CD_LOGIN_USUSIS
AC_ARTIGO_PERIODICO
PK,FK2 CD_UNIDADE_DOCUMENTAL_ARTIGO
CD_APR
CD_EPR
DS_TITULO_APR
CD_PAGINACAO_APR
DS_NOTAS_APR
DT_ATU_APR
CD_LOGIN_USUSIS
FK1 CD_UNIDADE_DOCUMENTAL_EXEMPLAR
TIPO_SUPORTE
PK CD_TIPO_SUPORTE
NM_TIPO_SUPORTE
SISTEMA
PK CD_SISTEMA
NM_SISTEMA
PO_SolicitacaoLog
PK IDLog
TipoLog
DataLog
IDSolicitacao
Descricao
DataAbertura
DataEncerramento
Status
Resultado
OmitirUsuario
IDUsuarioExterno
IDUsuarioResponsavel
IDUD
IDArquivo
IDAnotacaoUD
IDAnotacaoArquivo
PO_PastaArquivo
IDPasta
IDUD
IDArquivo
IDPastaUnidadeDeDescricao
DataInclusao
AC_TITULAR_AREA
PK CD_TITARE
FK4 CD_TFU
FK3 SG_TIT
FK1 CD_AAC
FK2 CD_INS
NM_LOCAL_TITARE
NR_ANO_FORMATURA_TITARE
AC_SERIE
PK CD_SER
FK1 SG_FUN
DS_SER
SG_SIGLA_SER
DS_NOTAS_SER
DT_LIBERACAO_SER
DT_ATU_SER
CD_LOGIN_USUSIS
IN_LIBERADA_CONSULTA
AC_LIVRO
PK CD_UNIDADE_DOCUMENTAL_LIVRO
CD_LIV
CD_CLASSIFICACAO_LIV
DS_TITULO_LIV
NM_EDICAO_LIV
NR_ANO_PUBLICACAO_DE_LIV
NR_ANO_PUBLICACAO_ATE_LIV
NM_LOCAL_PUBLICACAO_LIV
NM_EDITOR_LIV
NM_NUMERO_PAGINAS_LIV
DS_VOLUME_LIV
IN_ILUSTRACAO_LIV
NM_COLECAO_SERIE_LIV
DS_NOTAS_LIV
DS_INFO_PATRIMONIAL_LIV
DT_ATU_LIV
CD_LOGIN_USUSIS
AC_CONDICAO_ACESSO
PK CD_CONDICAO_ACESSO
DS_CONDICAO_ACESSO
SG_CONDICAO_ACESSO
UF
PK UF_SIGLA
UF_NOME
SITUACAO_ENTREVISTA
PK CD_SIT_ENTREVISTA
DS_SIT_ENTREVISTA
PO_AnotacaoArquivo
PK IDAnotacaoArquivo
IDArquivo
Descricao
DataCriacao
UltimaModificacao
ARQUIVO_DIGITAL_DESCRITOR
PK,FK1 CD_ARQUIVO_DIGITAL
PK CD_DEL
AC_CONDICAO_ACESSO_FUNDO
PK CD_CONDICAO_ACESSO_FUNDO
FK1 CD_CONDICAO_ACESSO
FK2 SG_FUN
FK3 CD_TIPO_UNIDADE_DOCUMENTAL
DT_CONDICAO_ACESSO
AC_AUTORIDADE_NAO_ELEITA
PK CD_AUT_NAO_ELEITA
NM_AUT_NAO_ELEITA
FK1 CD_AUT
DT_ATU_AUT_NAO_ELEITA
CD_LOGIN_USUSIS
USUARIO
PK CD_USUARIO
DS_LOGIN
FK1 CD_PERFIL_ACESSO
SUPORTE
PK CD_SUPORTE
FK1 CD_SESSAO_GRAVACAO
FK3 CD_TIPO_SUPORTE
DS_SUPORTE
FK2 CD_TECNICO
NR_QUANTIDADE_SUPORTE
PO_PastaUnidadeDeDescricao
PK IDPastaUnidadeDeDescricao
FK1 IDPasta
IDUnidadeDeDescricao
Origem
DataInclusao
InCopia
PERFIL_ACESSO
PK,FK1 CD_PERFIL_ACESSO
NM_PERFIL_ACESSO
CD_SISTEMA
ENTREVISTADOR
PK CD_ENTREVISTADOR
U1 NM_ENTREVISTADOR
NM_LOGR_ENTREVISTADOR
NM_BAIR_ENTREVISTADOR
NM_CID_ENTREVISTADOR
SG_UF_ENTREVISTADOR
NM_PAIS_ENTREVISTADOR
CD_CEP_ENTREVISTADOR
CD_TEL_ENTREVISTADOR
CD_CELULAR_ENTREVISTADOR
CD_EMAIL_ENTREVISTADOR
FK1 CD_INSTITUICAO
DS_FORMAC_ENTREVISTADOR
DS_OBS_ENTREVISTADOR
DH_CARGO_FUNCAO
PK CD_CARGO_FUNC
CD_CARGO
DS_CARGO_FUNC
BASEBUSCA
PK c4
c1
c2
c3
c5
c6
c7
c8
c9
c10
c11
c12
c13
c14
c15
AC_UNIDADE_DOCUMENTAL
PK CD_UNIDADE_DOCUMENTAL
FK2 CD_TIPO_UNIDADE_DOCUMENTAL
CD_SER
FK1 CD_SSE
AC_AREA_ACADEMICA
PK CD_AAC
DS_AAC
DT_ATU_AAC
CD_LOGIN_USUSIS
PO_AnotacaoUD
PK IDAnotacaoUD
IDUD
Descricao
DataCriacao
UltimaModificacao
AC_ATIVIDADE
PK CD_ATI
DS_ATI
DT_ATU_ATI
CD_LOGIN_USUSIS
PO_UsuarioPerfil
FK1 IDPerfil
FK2 IDUsuario
PERFIL_HISTORAL_CATALOGO
PK,FK1 CD_PESQUISA
PK DT_PESQUISA_PHC
CD_ENTREVISTA
DH_CARGO_TIPO
PK CD_CARGO_TP
DS_CARGO_TP
AC_TITULAR_ATIVIDADE
PK CD_TITATI
FK3 CD_TFU
FK1 CD_ATI
FK2 CD_INS
NR_MES_INICIO_TITATI
NR_ANO_INICIO_TITATI
NR_MES_FIM_TITATI
NR_ANO_FIM_TITATI
AC_SUB_SERIE
PK CD_SSE
FK1 CD_SER
DS_SSE
SG_SIGLA_SSE
DT_ATU_SSE
CD_LOGIN_USUSIS
AC_DESCRITOR_ELEITO
PK CD_DEL
U1 DS_DEL
DT_ATU_DEL
CD_LOGIN_USUSIS
IN_EXCLUSIVO_ACCESSUS
IN_EXCLUSIVO_ESTUDOS_HISTORICOS
IN_EXCLUSIVO_ABHO
IN_EXCLUSIVO_PRODUCAO_INTELECTUAL
DS_CONCEITUACAO_TERMO
AC_AUTORIDADE_UNIDADE_DOCUMENTAL
PK CD_AUTORIDADE_UNIDADE_DOCUMENTAL
FK3 CD_UNIDADE_DOCUMENTAL
FK1 CD_AUT
FK2 CD_RES
TECNICO
PK CD_TECNICO
U1 NM_TECNICO
PROJETO
PK CD_PROJETO
U1 NM_TIT_PROJETO
DT_INIC_PROJETO
DT_FIM_PROJETO
DS_RESULTADO
FK2 CD_INSTITUICAO_CONVENIO
DS_COND_CONTRATO
FK1 CD_INSTITUICAO_FINANC
DS_OBSERVACAO
LOCALIDADE
PK CD_LOCALIDADE
NM_CID_LOCALIDADE
SG_UF_LOCALIDADE
NM_PAIS_LOCALIDADE
DH_VERBETE_SUBTIPO
CD_VTI
DS_VTI
DS_SUB_VTI
DH_VERBETE_BIO_CARGO
PK CD_VBC
NR_DATA_INI
NR_DATA_FIM
CD_VRB
CD_CARGO
SG_UF
SG_PAIS
CD_CARGO_FUNC
CD_INS
CD_CID
AC_UNIDADE_DOCUMENTAL_FUNDO
PK,FK1 SG_FUN
PK,FK2 CD_UNIDADE_DOCUMENTAL
AC_PERIODICO
PK,FK1 CD_UNIDADE_DOCUMENTAL_PERIODICO
CD_PRI
CD_CLASSIFICACAO_PRI
DS_TITULO_PRI
NM_EDITOR_PRI
NM_LOCAL_PUBLICACAO_PRI
NM_PERIODICIDADE_PRI
NM_IDIOMA_PRI
DS_NOTAS_PRI
DT_ATU_PRI
CD_LOGIN_USUSIS
AC_FUNDO
PK SG_FUN
NM_FUN
DT_DOACAO_FUN
QT_VOLUME_ESTIMADO_FUN
DS_LOCALIZACAO_FISICA_FUN
DS_LOCALIZACAO_DIGITAL_FUN
DS_CODIGO_MICROFILME_FUN
DS_EQUIPE_FUN
DS_HISTORICO_ACERVO_FUN
DS_CONTEUDO_FUN
DS_NOTAS_FUN
DT_ATU_FUN
CD_LOGIN_USUSIS
DT_ABERTURA_CONSULTA_FUN
AC_ARQUIVO
PK CD_ARQUIVO
NM_ARQUIVO
DS_CAMINHO_ARQUIVO
DS_CAMINHO_ARQUIVO_ICON
DS_TEXTO_ARQUIVO
FK1 CD_TIPO_ARQUIVO
PO_Perfil
PK IDPerfil
Nome
FK1 CodigoSistema
PERFIL_PESQUISA_DESCRITOR
PK,FK1 CD_PESQUISA
PK CD_DNE
ENTREVISTA_ENTREVISTADO
PK,FK1 CD_ENTREVISTA
PK,FK2 CD_ENTREVISTADO
DH_CIDADE
CD_CID
DS_CID
DT_ATU_CID
CD_LOGIN_USUSIS
CD_RBR
AC_USUARIO_FUNDO
PK,FK3 CD_USU_FUN
CD_USERID_USU
FK4 CD_USUARIO
FK1 SG_FUN
FK2 CD_TIPO_UNIDADE_DOCUMENTAL
AC_DOADOR_ENDERECO
PK CD_DOADOR
PK CD_TIPO_ENDERECO
NM_LOGRADOURO
NR_NUMERO
NM_COMPLEMENTO
NM_BAIRRO
NM_CIDADE
FK2 UF_SIGLA
FK1 PA_SIGLA
NR_CEP
DT_ATU_DOE
CD_LOGIN_USUSIS
AC_AUDIOVISUAL
PK,FK1 CD_UNIDADE_DOCUMENTAL_AUDIOVISUAL
CD_AVI
CD_TDA
CD_CLASSIFICACAO_AVI
CD_PREFIXO_TITULO_AVI
DS_TITULO_AVI
DS_PERIODO_PRODUCAO_AVI
NR_ANO_PRODUCAO_DE_AVI
NR_ANO_PRODUCAO_ATE_AVI
CD_PDA
NR_DOCUMENTOS_AVI
DS_FISICA_AVI
DS_NOTAS_AVI
DS_RESUMO_AVI
DT_ATU_AVI
CD_LOGIN_USUSIS
TECNICO_ENTREVISTA
PK CD_TECNICO_ENTREVISTA
FK1 CD_ENTREVISTA
FK3 CD_TECNICO
FK2 CD_FUNCAO
PO_AreaAcademica
PK IDAreaAcademica
Nome
PERFIL_PESQUISA_ENTREVISTADO
PK,FK2 CD_PESQUISA
PK,FK1 CD_ENTREVISTADO
LOCALIDADE_ENTREVISTA
PK,FK1 CD_ENTREVISTA
PK,FK2 CD_LOCALIDADE
DOADOR
PK CD_DOADOR
U1 NM_DOADOR
DS_DOADOR
AC_CAPITULO_LIVRO
PK CD_UNIDADE_DOCUMENTAL_CAPITULO
CD_CLI
CD_LIV
DS_TITULO_CLI
CD_PAGINACAO_CLI
DS_NOTAS_CLI
DT_ATU_CLI
CD_LOGIN_USUSIS
FK1 CD_UNIDADE_DOCUMENTAL_LIVRO
PO_Usuario
PK IDUsuario
Tipo
Nome
Email
LoginFGV
Senha
DataNascimento
Sexo
FK2 IDGrauInstrucao
FK1 IDAreaAcademica
Cidade
IDUF
IDPais
FlagSpan
DataCriacao
Status
Guid
DataBloqueio
FlagSpanArquivologia
FlagSpanCienciasSociais
FlagSpanHistoria
FlagSpanNewsletterCPDOC
PO_Solicitacao
PK IDSolicitacao
Descricao
DataAbertura
DataEncerramento
Status
Resultado
OmitirUsuario
FK1 IDUsuarioExterno
FK2 IDUsuarioResponsavel
IDUD
IDArquivo
IDAnotacaoUD
IDAnotacaoArquivo
PO_GrauInstrucao
PK IDGrauInstrucao
Nome
ENTREVISTA_ENTREVISTADOR
PK,FK1 CD_ENTREVISTA
PK,FK2 CD_ENTREVISTADOR
AC_TIPO_ARQUIVO
PK CD_TIPO_ARQUIVO
NM_TIPO_ARQUIVO
AC_ARQUIVO_COLECAO
PK,FK1 CD_ARQUIVO
PK,FK2 CD_COLECAO
NR_SEQUENCIA
PO_Mensagem
PK IDMensagem
Texto
DataCriacao
FK1 IDSolicitacao
IDUsuario
PERFIL_PESQUISA_USUARIO
PK CD_PESQUISA
CD_USUARIO
DT_PESQUISA_PPU
CD_TIPO_CONSULTA
CD_TIPO_PESQ_DEL LOG_OPERACAO
PK Codigo
Operacao
Tabela
Dados
Data
ENTREVISTA_PROJETO
PK,FK1 CD_ENTREVISTA
PK,FK2 CD_PROJETO
ENTREVISTA
PK CD_ENTREVISTA
DS_OBJ_ENTREVISTA
U1 NM_TIT_ENTREVISTA
IN_TIP_ENTREVISTA
CD_DOADOR
DT_DOACAO_ENTREVISTA
DT_LIBERACAO_ENTREVISTA
DS_OBSERVACAO
IN_ARQ_DOC_ENTREVISTA
CD_SIT_ENTREVISTA
DS_REF_BIBLIOGRAFICA
DS_OBSERVACAO_FINAL
QT_DISQ_TRANSC_ENTREVISTA
NM_ARQ_TRANSC_ENTREVISTA
CD_DOC_TRANSC_ENTREVISTA
NU_PAGS_TRANSC_ENTREVISTA
DS_RESTRICOES_ACESSO
DS_JUSTIFICATIVA
NU_DISQ_TRANSC_ENTREVISTA
NM_PASTA_TRANSC_ENTREVISTA
IN_FICHA_TRANSC_ENTREVISTA
IN_FL_ROSTO_TRANSC_ENTREVISTA
IN_PROC_ENTREVISTA
NU_DISQ_AUDIO_ENTREVISTA
NM_PASTA_AUDIO_ENTREVISTA
IN_FICHA_AUDIO_ENTREVISTA
IN_FL_ROSTO_AUDIO_ENTREVISTA
NU_DISQ_AVISUAL_ENTREVISTA
NM_PASTA_AVISUAL_ENTREVISTA
IN_FICHA_AVISUAL_ENTREVISTA
IN_FL_ROSTO_AVISUAL_ENTREVISTA
DT_PREENCH_REL_ENTREVISTA
DS_CONTATO_ENTREVISTA
DS_LOC_ENTREVISTA
DS_ANDAMENTO_ENTREVISTA
DS_MUDANCA_ENTREVISTA
DS_INTERRUPCAO_ENTREVISTA
DS_PESSOA_PRESENTE
DS_COMENT_CESSAO_ENTREVISTA
DS_OUTRAS_OBSERVACOES
DT_ASSINATURA_CPDOC_ENTREVISTA
CD_ASSINADO_CPDOC_ENTREVISTA
DS_HERDEIRO_CPDOC_ENTREVISTA
DS_RESTRICAO_CPDOC_ENTREVISTA
DS_ENCAM_CPDOC_ENTREVISTA
DS_COND_USO_CPDOC_ENTREVISTA
DS_OBS_GRAVACAO
DS_RESUMO_FICHA_TEC
NM_TIT_ENTREVISTA_PESQ
DS_TEXTO_PUBLICACAO_CITACAO
DS_SUMARIO
CD_TRANSC_ENTREVISTA
FK1 CD_UNIDADE_DOCUMENTAL
IN_COBERTURA
DH_GOVERNO
PK CD_GOV
DS_GOV
NR_DATA_INIC_GOV
NR_DATA_FIM_GOV
DT_ATU_GOV
CD_LOGIN_USUSIS
AC_TITULAR_FUNDO
PK CD_TFU
NM_TFU
FK1 SG_FUN
NR_NASCIMENTO_DIA_TFU
NR_NASCIMENTO_MES_TFU
NR_NASCIMENTO_ANO_TFU
NM_PAI_TFU
NM_MAE_TFU
NM_CONJUGE_TFU
NR_FALECIMENTO_DIA_TFU
NR_FALECIMENTO_MES_TFU
NR_FALECIMENTO_ANO_TFU
DS_OUTRAS_ATIVIDADES_TFU
DS_NOTAS_TFU
DT_ATU_TFU
CD_LOGIN_USUSIS
FK2 CD_LOCALIDADE_NASCIMENTO
FK3 CD_LOCALIDADE_FALECIMENTO
AC_LOCALIDADE
PK CD_LOCALIDADE
NM_LOCALIDADE
FK1 CD_LOCALIDADE_PAI
FK2 CD_TIPO_LOCALIDADE
TECNICO_PROJETO
PK,FK1 CD_PROJETO
PK,FK2 CD_TECNICO
AJUDA
PK CD_AJUDA
DS_TEXTO_AJUDA
FK1 CD_FUNCIONALIDADE
AC_TIPO_LOCALIDADE
PK CD_TIPO_LOCALIDADE
NM_TIPO_LOCALIDADE
AC_DESCRITOR_NAO_ELEITO
PK CD_DNE
U1 DS_DNE
CD_DEL
DT_ATU_DNE
CD_LOGIN_USUSIS
IN_DNE_DEL
AC_ARQUIVO_FUNDO
PK,FK1 CD_ARQUIVO
PK,FK2 SG_FUN
NR_SEQUENCIA
TEMA_ENTREVISTA
PK,FK1 CD_ENTREVISTA
PK CD_DEL
SESSAO_GRAVACAO
PK CD_SESSAO_GRAVACAO
FK1 CD_ENTREVISTA
NU_SESS_GRAV
DT_SESS_GRAV
QT_HR_SESS_GRAV
QT_MIN_SESS_GRAV
DS_LOCAL
DS_OBSERVACOES
FUNCAO
PK CD_FUNCAO
NM_FUNCAO
ENTREVISTA_SUMARIO
PK CD_ENTREVISTA_SUMARIO
FK1 CD_ENTREVISTA
DS_SUMARIO
FK2 CD_TIPO_SUMARIO
DH_VERBETE_IMPORTADO
FK1 CD_VRB
Nome
Conteudo
InAntigo
CONDICAO_ACESSO
PK CD_CONDICAO_ACESSO
DS_CONDICAO_ACESSO
DS_CONDICAO_ACESSO_EXIBICAO
AC_TIPO_UNIDADE_DOCUMENTAL
PK CD_TIPO_UNIDADE_DOCUMENTAL
NM_TIPO_UNIDADE_DOCUMENTAL
SG_TIPO_UNIDADE_DOCUMENTAL
FK1 CD_TIPO_UNIDADE_DOCUMENTAL_PAI
NR_SEQUENCIA_EXIBICAO
PERMISSAO
PK CD_PERMISSAO
FK2 CD_PERFIL_ACESSO
FK1 CD_FUNCIONALIDADE
IN_ACESSO
IN_INCLUSAO
IN_ALTERACAO
IN_EXCLUSAO
PAIS
PK PA_SIGLA
PA_NOME
PA_NOME_INGLES
FUNCIONALIDADE
PK CD_FUNCIONALIDADE
NM_FUNCIONALIDADE
SG_FUNCIONALIDADE
FK1 CD_SISTEMA
ARQUIVO_DIGITAL
PK CD_ARQUIVO_DIGITAL
DS_CAMINHO_ARQUIVO
FK3 CD_TIPO_ARQUIVO
IN_LIBERADO_CONSULTA
DS_METADADOS
DS_LEGENDA
FK1 CD_ENTREVISTA
DS_URL_ARQUIVO
FK2 CD_SESSAO_GRAVACAO
AC_PRECISAO_DATA
PK CD_PDA
DS_PDA
AC_LOCALIDADE_UNIDADE_DOCUMENTAL
PK,FK2 CD_UNIDADE_DOCUMENTAL
PK,FK1 CD_LOCALIDADE
AC_COLECAO
PK CD_COLECAO
NM_COLECAO
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 5 / 22
Problems
Currently architecture is hard and costly to maintain and improve
given the relational model nature and systems;
innovative initiatives are usually postponed;
The data is available online but on the “deep web”;
CPDOC’s do not adopt any standard data model or vocab: (1) inhibit
interoperability with other open resources; and (2) hardly useful for
people outside CPDOC.
data ïŹles (audio, videos and images) scattered in diïŹ€erent ïŹle servers,
DB only stores metadata and ïŹle paths (loose coupling).
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 6 / 22
Some inconsistencies
“verbete” is a dictionary entry. “bio cargo” is a position (“cargo”) that the
described person had during a speciïŹc time during which he/she carried on a
particular assignment (“funcao”). Controled lists but no standards! Double
relation between “bio cargo” and “cargo”.
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 7 / 22
Inconsistencies are not always straightforward to ïŹx
DELETE {
?bioc cpdoc:dbo_DH_VERBETE_BIO_CARGO_CD_CARGO ?cargo
}
INSERT {
graph <http://cpdoc.fgv.br/sys/update1/> {
?bioc cpdoc:dbo_DH_VERBETE_BIO_CARGO_CD_CARGO_FUNC _:funcao .
_:funcao rdf:type cpdoc:dbo_DH_CARGO_FUNCAO ;
cpdoc:dbo_DH_CARGO_FUNCAO_CD_CARGO ?cargo .
}
}
WHERE {
?bioc cpdoc:dbo_DH_VERBETE_BIO_CARGO_CD_CARGO ?cargo .
filter not exists {
?bioc cpdoc:dbo_DH_VERBETE_BIO_CARGO_CD_CARGO_FUNC ?cf .
?cf cpdoc:dbo_DH_CARGO_FUNCAO_CD_CARGO ?cargo .
}
}
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 8 / 22
. . . when we recognize the battle
against chaos, mess, and unmastered
complexity as one of computing
science’s major callings, we must
admit that “Beauty is our Business”.
(Edsger W. Dijkstra)
Some beautiful arguments using mathematical induction. http: // goo. gl/ KQ9j7Q .
The Long Run Project
Joint project between CPDOC and EMAp (Mathematical School);
Enrich the structure (semantics) of CPDOC data;
Open and expose CPDOC’s data and architecture making it more
maintainable and dynamic;
Uniform and integrated data treatment (standards and interlinks
between collections).
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 10 / 22
Motivations
Open Linked Data Initiative Principals;
Distributed open source development model/tools (collaborative data
maintenance and creation);
From data owner to data curator;
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 11 / 22
The migration process
(1) D2RQ was extracted RDF from relational; (2) enrichment of data semantics
(next slides); (3) DHBB entries to simple markdown ïŹles with YAML headers; (4)
PHO and Accessus collections are moved to DRMS (standards vocab, access
control, faced search, long-term preservation, OAI-PMH support etc.
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 12 / 22
The desired architecture
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 13 / 22
Improving semantics
1-1 with original relational DB. The connection of technician and interview is
parameterized by diïŹ€erent roles, the donator, interviewer and interviewed of an
interview are modeled each one in a speciïŹc table. In this case interviewed,
interviewer, donator and technician are all people (“ad hoc” modeling).
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 14 / 22
Improving semantics
prov centric but uses skos, dc, foaf, bio and geo, frbr etc. some classes can be
subclasses of standard classes, Interview, some classes can be replaced by
standard classes, localidade.
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 15 / 22
Conclusions
Challenge 1: convince CPDOC researchers to make the transition to
data owners to curators.
Challenge 2: adapt researchers to new technologies (VC, text editors,
scripts?, distributed workïŹ‚ow etc)
Model reïŹnements (corrections, transformations by alignments) can
be not straightforward.
Still a lot to be done. For instance...
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 16 / 22
Other Research Opportunities
Natural language processing: processing the DHBB entries to
discover relations between entries and with other linked data and
resources. DHBB for NLP and vice versa!
Ontology alignmnent algorithms for (semi-)automated the model
transformations.
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 17 / 22
Natural Language Processing
Manually discovered ≈ 50 links to dbpedia (Presidents of Brazil,
presidents of the Senate, political parties etc.)
NLP and text mining of DHBB entries: (1) proper names; (2) word
sense disambiguation using the openWordnet-PT (lexical resource);
and (3) named entity recognition and creation of links between
DHBB entries.
133,036 proper names identiïŹed (some few mistakes). Potencially
entities (people, locations, organizations etc)
Use grammars, lexical resources, formal ontologies, and logical tools
to reason about knowledge obtained from processing text in
Portuguese (Computational Semantics: KB, KR, and ATP);
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 18 / 22
Natural Language Processing
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 19 / 22
Audio and Transcriptions
Sinal processing to (semi-) automatic produce transcriptions, alignment
with already available transcriptions and audio segmentation
(interviewer/inverviwed);
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 20 / 22
Faces recognition and identiïŹcation
Image processing techniques to face recognition in photos collections.
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 21 / 22
Obrigado!
S: (v) thank, give thanks (express gratitude or show appreciation to)
(=>
(and
(instance ?THANK Thanking)
(agent ?THANK ?AGENT)
(patient ?THANK ?THING)
(destination ?THANK ?PERSON))
(and
(instance ?PERSON Human)
(or
(holdsDuring
(WhenFn ?THANK)
(wants ?AGENT ?THING))
(holdsDuring
(WhenFn ?THANK)
(desires ?AGENT ?THING)))))
SUMO Ontology, http://www.ontologyportal.org
A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 22 / 22

Weitere Àhnliche Inhalte

KĂŒrzlich hochgeladen

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 

KĂŒrzlich hochgeladen (20)

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceChristy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

A linked open data architecture for contemporary historical archives

  • 1. A linked open data architecture for contemporary historical archives Alexandre Rademaker1 Suemi Higuchi2 DÂŽario Augusto B. Oliveira2 IBM Research and FGV/EMAp FGV/CPDOC September 25, 2013
  • 2. Getulio Vargas Foundation (FGV) Brazilian higher education and research institution founded in December 20, 1944. It oïŹ€ers regular courses of Economics, Business Administration, Law, Social Sciences and Applied Mathematics. Its original goal was to train people for the country’s public- and private-sector management. It is considered by Foreign Policy magazine to be a top-5 policymaker think-tank worldwide. http://portal.fgv.br A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 2 / 22
  • 3. CPDOC - Center of Brazilian Contemporary History A major center for teaching and researching in the Social Sciences and Contemporary History located in Rio de Janeiro. It holds: Personal Archives (Acessus) ≈ 200 archives, up to 1,8M docs or 5.2M pages (700K digitalized), among text (handwritten and printed), letters, memos, diaries, images and videos. Oral History Program (PHO) A huge set of testimonies (in audio and video) consisting of more than 2K interviews, which correspond to up to 6K hours of recordings. 90% in digital format. Only 10% is transcribed. Limit access, not online. Brazilian Historical Biographic Dictionary (DHBB) 7,5K entries, 6,5K are of biographical and 1K related to institutions, events and concepts of interest for the Brazilian history after 1930. Carefully revised entries by researchers. Few metadata. A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 3 / 22
  • 4. Currently Architecture A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 4 / 22
  • 5. Currently Relational DB 89 tables/classes and 660 columns/properties. TIPO_ARQUIVO PK CD_TIPO_ARQUIVO NM_TIPO_ARQUIVO PO_Pasta PK IDPasta Tipo Descricao DataCriacao UltimaModificacao FK1 IDUsuario ENTREVISTADO PK CD_ENTREVISTADO NM_ENTREVISTADO NM_SOBRENOME_ENTREVISTADO NM_NACIONALIDADE_ENTREVISTADO CD_EST_CIVIL_ENTREVISTADO NM_PROFISS_ENTREVISTADO CD_CPF_ENTREVISTADO CD_RG_ENTREVISTADO CD_ORG_EMISS_ENTREVISTADO NM_LOGR_RESID_ENTREVISTADO NM_BAIR_RESID_ENTREVISTADO NM_CID_RESID_ENTREVISTADO SG_UF_RESID_ENTREVISTADO NM_PAIS_RESID_ENTREVISTADO CD_CEP_RESID_ENTREVISTADO CD_TEL_RESID_ENTREVISTADO CD_CELULAR_RESID_ENTREVISTADO CD_EMAIL_RESID_ENTREVISTADO NM_LOGR_COMERC_ENTREVISTADO NM_BAIR_COMERC_ENTREVISTADO NM_CID_COMERC_ENTREVISTADO SG_UF_COMERC_ENTREVISTADO NM_PAIS_COMERC_ENTREVISTADO CD_CEP_COMERC_ENTREVISTADO CD_TEL_COMERC_ENTREVISTADO CD_CELULAR_COMERC_ENTREVISTADO CD_EMAIL_COMERC_ENTREVISTADO NM_CONTATO_ENTREVISTADO DS_QLFCAO_CONTATO_ENTREVISTADO CD_TEL_CONTATO_ENTREVISTADO CD_CELULAR_CONTATO_ENTREVISTADO CD_EMAIL_CONTATO_ENTREVISTADO DT_NASC_ENTREVISTADO FK2 NM_LOC_NASC_ENTREVISTADO DT_FALEC_ENTREVISTADO FK1 NM_LOCAL_FALEC_ENTREVISTADO DS_ATIVIDADE DS_FORMACAO DS_OBSERVACAO NM_COMPLETO_ENTREVISTADO_PESQ DH_VERBETE PK CD_VRB CD_TP_VRB IN_SIT_EDICAO_VRB NM_VRB NM_PESQ_VRB CD_VRB_ORIGINAL DS_LEAD_VRB DS_OBS_VRB DT_ATU_VRB CD_LOGIN_USUSIS DS_CONTEUDO cd_vti NM_CONHECIDO_VRB IN_PUBLICADO FK1 CD_UNIDADE_DOCUMENTAL CONDICAO_ACESSO_ENTREVISTA PK CD_CONDICAO_ACESSO_ENTREVISTA FK2 CD_ENTREVISTA FK1 CD_CONDICAO_ACESSO IN_LIBERADO AC_RESPONSABILIDADE PK CD_RES DS_RES DT_ATU_RES CD_LOGIN_USUSIS AC_MANUSCRITO CD_MAN CD_CLASSIFICACAO_MAN NR_DOCUMENTOS_MAN IN_DOCUMENTO_TIPO_MAN NR_DOCUMENTOS_TIPO_MAN DS_PERIODO_PRODUCAO_MAN NR_ANO_PRODUCAO_DE_MAN NR_ANO_PRODUCAO_ATE_MAN CD_PDA CD_MICROFILME_MAN DS_RESUMO_MAN DS_NOTAS_MAN DT_ATU_MAN CD_LOGIN_USUSIS FK1 CD_UNIDADE_DOCUMENTAL_MANUSCRITO AC_INSTITUICAO PK CD_INS U1 DS_INS DT_ATU_INS CD_LOGIN_USUSIS AC_ARQUIVO_UNIDADE_DOCUMENTAL PK,FK2 CD_UNIDADE_DOCUMENTAL PK,FK1 CD_ARQUIVO NR_SEQUENCIA TIPO_SUMARIO PK CD_TIPO_SUMARIO DS_TIPO_SUMARIO PALAVRA_NAO_CAPITALIZAVEL PALAVRA AC_TITULACAO PK SG_TIT DS_TIT NR_ORDEM AC_DOADOR_FUNDO PK CD_DOADOR PK,FK1 SG_FUN AC_DESCRITOR_UNIDADE_DOCUMENTAL PK,FK1 CD_UNIDADE_DOCUMENTAL PK CD_DEL AC_COLECAO_UNIDADE_DOCUMENTAL PK CD_COLECAO PK,FK1 CD_UNIDADE_DOCUMENTAL NR_SEQUENCIA PARAMETRO_CONSULTA NM_PARAMETRO CD_NOTREE_PARAMETRO DS_CONTEUDO_PARAMETRO IN_REL_VISIVEL_PARAMETRO IN_USU_COMUM_PARAMETRO DS_EXPLICACAO_PARAMETRO INSTITUICAO PK CD_INSTITUICAO U1 NM_RAZ_SOC_INSTITUICAO NM_LOGR_INSTITUICAO NM_BAIR_INSTITUICAO NM_CID_INSTITUICAO SG_UF_INSTITUICAO NM_PAIS_INSTITUICAO CD_CEP_INSTITUICAO CD_TEL_INSTITUICAO CD_FAX_INSTITUICAO CD_EMAIL_INSTITUICAO DH_CARGO PK CD_CARGO DS_CARGO DS_ABREV_CARGO CD_CARGO_TP DT_ATU_CARGO CD_LOGIN_USUSIS AC_EXEMPLAR_PERIODICO PK,FK2 CD_UNIDADE_DOCUMENTAL_EXEMPLAR CD_EPR CD_PRI CD_VOLUME_EPR CD_NUMERO_EPR DS_DATA_PUBLICACAO_EPR NR_ANO_PUBLICACAO_DE_EPR NR_ANO_PUBLICACAO_ATE_EPR DS_NOTAS_EPR DT_ATU_EPR CD_LOGIN_USUSIS FK1 CD_UNIDADE_DOCUMENTAL_PERIODICO AC_DOADOR PK CD_DOADOR NM_DOADOR NM_CONJUGE_DOA DS_PRINCIPAIS_ATIVIDADES_DOA DS_NOTAS_DOA IN_CONSELHO_DOADORES DT_ATU_DOA CD_LOGIN_USUSIS CD_SERVICO CD_EXT_SERVCLI IN_FALECIDO DIA_FALECIMENTO MES_FALECIMENTO ANO_FALECIMENTO AC_AUTORIDADE PK CD_AUT NM_AUT DT_ATU_AUT CD_LOGIN_USUSIS AC_ARTIGO_PERIODICO PK,FK2 CD_UNIDADE_DOCUMENTAL_ARTIGO CD_APR CD_EPR DS_TITULO_APR CD_PAGINACAO_APR DS_NOTAS_APR DT_ATU_APR CD_LOGIN_USUSIS FK1 CD_UNIDADE_DOCUMENTAL_EXEMPLAR TIPO_SUPORTE PK CD_TIPO_SUPORTE NM_TIPO_SUPORTE SISTEMA PK CD_SISTEMA NM_SISTEMA PO_SolicitacaoLog PK IDLog TipoLog DataLog IDSolicitacao Descricao DataAbertura DataEncerramento Status Resultado OmitirUsuario IDUsuarioExterno IDUsuarioResponsavel IDUD IDArquivo IDAnotacaoUD IDAnotacaoArquivo PO_PastaArquivo IDPasta IDUD IDArquivo IDPastaUnidadeDeDescricao DataInclusao AC_TITULAR_AREA PK CD_TITARE FK4 CD_TFU FK3 SG_TIT FK1 CD_AAC FK2 CD_INS NM_LOCAL_TITARE NR_ANO_FORMATURA_TITARE AC_SERIE PK CD_SER FK1 SG_FUN DS_SER SG_SIGLA_SER DS_NOTAS_SER DT_LIBERACAO_SER DT_ATU_SER CD_LOGIN_USUSIS IN_LIBERADA_CONSULTA AC_LIVRO PK CD_UNIDADE_DOCUMENTAL_LIVRO CD_LIV CD_CLASSIFICACAO_LIV DS_TITULO_LIV NM_EDICAO_LIV NR_ANO_PUBLICACAO_DE_LIV NR_ANO_PUBLICACAO_ATE_LIV NM_LOCAL_PUBLICACAO_LIV NM_EDITOR_LIV NM_NUMERO_PAGINAS_LIV DS_VOLUME_LIV IN_ILUSTRACAO_LIV NM_COLECAO_SERIE_LIV DS_NOTAS_LIV DS_INFO_PATRIMONIAL_LIV DT_ATU_LIV CD_LOGIN_USUSIS AC_CONDICAO_ACESSO PK CD_CONDICAO_ACESSO DS_CONDICAO_ACESSO SG_CONDICAO_ACESSO UF PK UF_SIGLA UF_NOME SITUACAO_ENTREVISTA PK CD_SIT_ENTREVISTA DS_SIT_ENTREVISTA PO_AnotacaoArquivo PK IDAnotacaoArquivo IDArquivo Descricao DataCriacao UltimaModificacao ARQUIVO_DIGITAL_DESCRITOR PK,FK1 CD_ARQUIVO_DIGITAL PK CD_DEL AC_CONDICAO_ACESSO_FUNDO PK CD_CONDICAO_ACESSO_FUNDO FK1 CD_CONDICAO_ACESSO FK2 SG_FUN FK3 CD_TIPO_UNIDADE_DOCUMENTAL DT_CONDICAO_ACESSO AC_AUTORIDADE_NAO_ELEITA PK CD_AUT_NAO_ELEITA NM_AUT_NAO_ELEITA FK1 CD_AUT DT_ATU_AUT_NAO_ELEITA CD_LOGIN_USUSIS USUARIO PK CD_USUARIO DS_LOGIN FK1 CD_PERFIL_ACESSO SUPORTE PK CD_SUPORTE FK1 CD_SESSAO_GRAVACAO FK3 CD_TIPO_SUPORTE DS_SUPORTE FK2 CD_TECNICO NR_QUANTIDADE_SUPORTE PO_PastaUnidadeDeDescricao PK IDPastaUnidadeDeDescricao FK1 IDPasta IDUnidadeDeDescricao Origem DataInclusao InCopia PERFIL_ACESSO PK,FK1 CD_PERFIL_ACESSO NM_PERFIL_ACESSO CD_SISTEMA ENTREVISTADOR PK CD_ENTREVISTADOR U1 NM_ENTREVISTADOR NM_LOGR_ENTREVISTADOR NM_BAIR_ENTREVISTADOR NM_CID_ENTREVISTADOR SG_UF_ENTREVISTADOR NM_PAIS_ENTREVISTADOR CD_CEP_ENTREVISTADOR CD_TEL_ENTREVISTADOR CD_CELULAR_ENTREVISTADOR CD_EMAIL_ENTREVISTADOR FK1 CD_INSTITUICAO DS_FORMAC_ENTREVISTADOR DS_OBS_ENTREVISTADOR DH_CARGO_FUNCAO PK CD_CARGO_FUNC CD_CARGO DS_CARGO_FUNC BASEBUSCA PK c4 c1 c2 c3 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 AC_UNIDADE_DOCUMENTAL PK CD_UNIDADE_DOCUMENTAL FK2 CD_TIPO_UNIDADE_DOCUMENTAL CD_SER FK1 CD_SSE AC_AREA_ACADEMICA PK CD_AAC DS_AAC DT_ATU_AAC CD_LOGIN_USUSIS PO_AnotacaoUD PK IDAnotacaoUD IDUD Descricao DataCriacao UltimaModificacao AC_ATIVIDADE PK CD_ATI DS_ATI DT_ATU_ATI CD_LOGIN_USUSIS PO_UsuarioPerfil FK1 IDPerfil FK2 IDUsuario PERFIL_HISTORAL_CATALOGO PK,FK1 CD_PESQUISA PK DT_PESQUISA_PHC CD_ENTREVISTA DH_CARGO_TIPO PK CD_CARGO_TP DS_CARGO_TP AC_TITULAR_ATIVIDADE PK CD_TITATI FK3 CD_TFU FK1 CD_ATI FK2 CD_INS NR_MES_INICIO_TITATI NR_ANO_INICIO_TITATI NR_MES_FIM_TITATI NR_ANO_FIM_TITATI AC_SUB_SERIE PK CD_SSE FK1 CD_SER DS_SSE SG_SIGLA_SSE DT_ATU_SSE CD_LOGIN_USUSIS AC_DESCRITOR_ELEITO PK CD_DEL U1 DS_DEL DT_ATU_DEL CD_LOGIN_USUSIS IN_EXCLUSIVO_ACCESSUS IN_EXCLUSIVO_ESTUDOS_HISTORICOS IN_EXCLUSIVO_ABHO IN_EXCLUSIVO_PRODUCAO_INTELECTUAL DS_CONCEITUACAO_TERMO AC_AUTORIDADE_UNIDADE_DOCUMENTAL PK CD_AUTORIDADE_UNIDADE_DOCUMENTAL FK3 CD_UNIDADE_DOCUMENTAL FK1 CD_AUT FK2 CD_RES TECNICO PK CD_TECNICO U1 NM_TECNICO PROJETO PK CD_PROJETO U1 NM_TIT_PROJETO DT_INIC_PROJETO DT_FIM_PROJETO DS_RESULTADO FK2 CD_INSTITUICAO_CONVENIO DS_COND_CONTRATO FK1 CD_INSTITUICAO_FINANC DS_OBSERVACAO LOCALIDADE PK CD_LOCALIDADE NM_CID_LOCALIDADE SG_UF_LOCALIDADE NM_PAIS_LOCALIDADE DH_VERBETE_SUBTIPO CD_VTI DS_VTI DS_SUB_VTI DH_VERBETE_BIO_CARGO PK CD_VBC NR_DATA_INI NR_DATA_FIM CD_VRB CD_CARGO SG_UF SG_PAIS CD_CARGO_FUNC CD_INS CD_CID AC_UNIDADE_DOCUMENTAL_FUNDO PK,FK1 SG_FUN PK,FK2 CD_UNIDADE_DOCUMENTAL AC_PERIODICO PK,FK1 CD_UNIDADE_DOCUMENTAL_PERIODICO CD_PRI CD_CLASSIFICACAO_PRI DS_TITULO_PRI NM_EDITOR_PRI NM_LOCAL_PUBLICACAO_PRI NM_PERIODICIDADE_PRI NM_IDIOMA_PRI DS_NOTAS_PRI DT_ATU_PRI CD_LOGIN_USUSIS AC_FUNDO PK SG_FUN NM_FUN DT_DOACAO_FUN QT_VOLUME_ESTIMADO_FUN DS_LOCALIZACAO_FISICA_FUN DS_LOCALIZACAO_DIGITAL_FUN DS_CODIGO_MICROFILME_FUN DS_EQUIPE_FUN DS_HISTORICO_ACERVO_FUN DS_CONTEUDO_FUN DS_NOTAS_FUN DT_ATU_FUN CD_LOGIN_USUSIS DT_ABERTURA_CONSULTA_FUN AC_ARQUIVO PK CD_ARQUIVO NM_ARQUIVO DS_CAMINHO_ARQUIVO DS_CAMINHO_ARQUIVO_ICON DS_TEXTO_ARQUIVO FK1 CD_TIPO_ARQUIVO PO_Perfil PK IDPerfil Nome FK1 CodigoSistema PERFIL_PESQUISA_DESCRITOR PK,FK1 CD_PESQUISA PK CD_DNE ENTREVISTA_ENTREVISTADO PK,FK1 CD_ENTREVISTA PK,FK2 CD_ENTREVISTADO DH_CIDADE CD_CID DS_CID DT_ATU_CID CD_LOGIN_USUSIS CD_RBR AC_USUARIO_FUNDO PK,FK3 CD_USU_FUN CD_USERID_USU FK4 CD_USUARIO FK1 SG_FUN FK2 CD_TIPO_UNIDADE_DOCUMENTAL AC_DOADOR_ENDERECO PK CD_DOADOR PK CD_TIPO_ENDERECO NM_LOGRADOURO NR_NUMERO NM_COMPLEMENTO NM_BAIRRO NM_CIDADE FK2 UF_SIGLA FK1 PA_SIGLA NR_CEP DT_ATU_DOE CD_LOGIN_USUSIS AC_AUDIOVISUAL PK,FK1 CD_UNIDADE_DOCUMENTAL_AUDIOVISUAL CD_AVI CD_TDA CD_CLASSIFICACAO_AVI CD_PREFIXO_TITULO_AVI DS_TITULO_AVI DS_PERIODO_PRODUCAO_AVI NR_ANO_PRODUCAO_DE_AVI NR_ANO_PRODUCAO_ATE_AVI CD_PDA NR_DOCUMENTOS_AVI DS_FISICA_AVI DS_NOTAS_AVI DS_RESUMO_AVI DT_ATU_AVI CD_LOGIN_USUSIS TECNICO_ENTREVISTA PK CD_TECNICO_ENTREVISTA FK1 CD_ENTREVISTA FK3 CD_TECNICO FK2 CD_FUNCAO PO_AreaAcademica PK IDAreaAcademica Nome PERFIL_PESQUISA_ENTREVISTADO PK,FK2 CD_PESQUISA PK,FK1 CD_ENTREVISTADO LOCALIDADE_ENTREVISTA PK,FK1 CD_ENTREVISTA PK,FK2 CD_LOCALIDADE DOADOR PK CD_DOADOR U1 NM_DOADOR DS_DOADOR AC_CAPITULO_LIVRO PK CD_UNIDADE_DOCUMENTAL_CAPITULO CD_CLI CD_LIV DS_TITULO_CLI CD_PAGINACAO_CLI DS_NOTAS_CLI DT_ATU_CLI CD_LOGIN_USUSIS FK1 CD_UNIDADE_DOCUMENTAL_LIVRO PO_Usuario PK IDUsuario Tipo Nome Email LoginFGV Senha DataNascimento Sexo FK2 IDGrauInstrucao FK1 IDAreaAcademica Cidade IDUF IDPais FlagSpan DataCriacao Status Guid DataBloqueio FlagSpanArquivologia FlagSpanCienciasSociais FlagSpanHistoria FlagSpanNewsletterCPDOC PO_Solicitacao PK IDSolicitacao Descricao DataAbertura DataEncerramento Status Resultado OmitirUsuario FK1 IDUsuarioExterno FK2 IDUsuarioResponsavel IDUD IDArquivo IDAnotacaoUD IDAnotacaoArquivo PO_GrauInstrucao PK IDGrauInstrucao Nome ENTREVISTA_ENTREVISTADOR PK,FK1 CD_ENTREVISTA PK,FK2 CD_ENTREVISTADOR AC_TIPO_ARQUIVO PK CD_TIPO_ARQUIVO NM_TIPO_ARQUIVO AC_ARQUIVO_COLECAO PK,FK1 CD_ARQUIVO PK,FK2 CD_COLECAO NR_SEQUENCIA PO_Mensagem PK IDMensagem Texto DataCriacao FK1 IDSolicitacao IDUsuario PERFIL_PESQUISA_USUARIO PK CD_PESQUISA CD_USUARIO DT_PESQUISA_PPU CD_TIPO_CONSULTA CD_TIPO_PESQ_DEL LOG_OPERACAO PK Codigo Operacao Tabela Dados Data ENTREVISTA_PROJETO PK,FK1 CD_ENTREVISTA PK,FK2 CD_PROJETO ENTREVISTA PK CD_ENTREVISTA DS_OBJ_ENTREVISTA U1 NM_TIT_ENTREVISTA IN_TIP_ENTREVISTA CD_DOADOR DT_DOACAO_ENTREVISTA DT_LIBERACAO_ENTREVISTA DS_OBSERVACAO IN_ARQ_DOC_ENTREVISTA CD_SIT_ENTREVISTA DS_REF_BIBLIOGRAFICA DS_OBSERVACAO_FINAL QT_DISQ_TRANSC_ENTREVISTA NM_ARQ_TRANSC_ENTREVISTA CD_DOC_TRANSC_ENTREVISTA NU_PAGS_TRANSC_ENTREVISTA DS_RESTRICOES_ACESSO DS_JUSTIFICATIVA NU_DISQ_TRANSC_ENTREVISTA NM_PASTA_TRANSC_ENTREVISTA IN_FICHA_TRANSC_ENTREVISTA IN_FL_ROSTO_TRANSC_ENTREVISTA IN_PROC_ENTREVISTA NU_DISQ_AUDIO_ENTREVISTA NM_PASTA_AUDIO_ENTREVISTA IN_FICHA_AUDIO_ENTREVISTA IN_FL_ROSTO_AUDIO_ENTREVISTA NU_DISQ_AVISUAL_ENTREVISTA NM_PASTA_AVISUAL_ENTREVISTA IN_FICHA_AVISUAL_ENTREVISTA IN_FL_ROSTO_AVISUAL_ENTREVISTA DT_PREENCH_REL_ENTREVISTA DS_CONTATO_ENTREVISTA DS_LOC_ENTREVISTA DS_ANDAMENTO_ENTREVISTA DS_MUDANCA_ENTREVISTA DS_INTERRUPCAO_ENTREVISTA DS_PESSOA_PRESENTE DS_COMENT_CESSAO_ENTREVISTA DS_OUTRAS_OBSERVACOES DT_ASSINATURA_CPDOC_ENTREVISTA CD_ASSINADO_CPDOC_ENTREVISTA DS_HERDEIRO_CPDOC_ENTREVISTA DS_RESTRICAO_CPDOC_ENTREVISTA DS_ENCAM_CPDOC_ENTREVISTA DS_COND_USO_CPDOC_ENTREVISTA DS_OBS_GRAVACAO DS_RESUMO_FICHA_TEC NM_TIT_ENTREVISTA_PESQ DS_TEXTO_PUBLICACAO_CITACAO DS_SUMARIO CD_TRANSC_ENTREVISTA FK1 CD_UNIDADE_DOCUMENTAL IN_COBERTURA DH_GOVERNO PK CD_GOV DS_GOV NR_DATA_INIC_GOV NR_DATA_FIM_GOV DT_ATU_GOV CD_LOGIN_USUSIS AC_TITULAR_FUNDO PK CD_TFU NM_TFU FK1 SG_FUN NR_NASCIMENTO_DIA_TFU NR_NASCIMENTO_MES_TFU NR_NASCIMENTO_ANO_TFU NM_PAI_TFU NM_MAE_TFU NM_CONJUGE_TFU NR_FALECIMENTO_DIA_TFU NR_FALECIMENTO_MES_TFU NR_FALECIMENTO_ANO_TFU DS_OUTRAS_ATIVIDADES_TFU DS_NOTAS_TFU DT_ATU_TFU CD_LOGIN_USUSIS FK2 CD_LOCALIDADE_NASCIMENTO FK3 CD_LOCALIDADE_FALECIMENTO AC_LOCALIDADE PK CD_LOCALIDADE NM_LOCALIDADE FK1 CD_LOCALIDADE_PAI FK2 CD_TIPO_LOCALIDADE TECNICO_PROJETO PK,FK1 CD_PROJETO PK,FK2 CD_TECNICO AJUDA PK CD_AJUDA DS_TEXTO_AJUDA FK1 CD_FUNCIONALIDADE AC_TIPO_LOCALIDADE PK CD_TIPO_LOCALIDADE NM_TIPO_LOCALIDADE AC_DESCRITOR_NAO_ELEITO PK CD_DNE U1 DS_DNE CD_DEL DT_ATU_DNE CD_LOGIN_USUSIS IN_DNE_DEL AC_ARQUIVO_FUNDO PK,FK1 CD_ARQUIVO PK,FK2 SG_FUN NR_SEQUENCIA TEMA_ENTREVISTA PK,FK1 CD_ENTREVISTA PK CD_DEL SESSAO_GRAVACAO PK CD_SESSAO_GRAVACAO FK1 CD_ENTREVISTA NU_SESS_GRAV DT_SESS_GRAV QT_HR_SESS_GRAV QT_MIN_SESS_GRAV DS_LOCAL DS_OBSERVACOES FUNCAO PK CD_FUNCAO NM_FUNCAO ENTREVISTA_SUMARIO PK CD_ENTREVISTA_SUMARIO FK1 CD_ENTREVISTA DS_SUMARIO FK2 CD_TIPO_SUMARIO DH_VERBETE_IMPORTADO FK1 CD_VRB Nome Conteudo InAntigo CONDICAO_ACESSO PK CD_CONDICAO_ACESSO DS_CONDICAO_ACESSO DS_CONDICAO_ACESSO_EXIBICAO AC_TIPO_UNIDADE_DOCUMENTAL PK CD_TIPO_UNIDADE_DOCUMENTAL NM_TIPO_UNIDADE_DOCUMENTAL SG_TIPO_UNIDADE_DOCUMENTAL FK1 CD_TIPO_UNIDADE_DOCUMENTAL_PAI NR_SEQUENCIA_EXIBICAO PERMISSAO PK CD_PERMISSAO FK2 CD_PERFIL_ACESSO FK1 CD_FUNCIONALIDADE IN_ACESSO IN_INCLUSAO IN_ALTERACAO IN_EXCLUSAO PAIS PK PA_SIGLA PA_NOME PA_NOME_INGLES FUNCIONALIDADE PK CD_FUNCIONALIDADE NM_FUNCIONALIDADE SG_FUNCIONALIDADE FK1 CD_SISTEMA ARQUIVO_DIGITAL PK CD_ARQUIVO_DIGITAL DS_CAMINHO_ARQUIVO FK3 CD_TIPO_ARQUIVO IN_LIBERADO_CONSULTA DS_METADADOS DS_LEGENDA FK1 CD_ENTREVISTA DS_URL_ARQUIVO FK2 CD_SESSAO_GRAVACAO AC_PRECISAO_DATA PK CD_PDA DS_PDA AC_LOCALIDADE_UNIDADE_DOCUMENTAL PK,FK2 CD_UNIDADE_DOCUMENTAL PK,FK1 CD_LOCALIDADE AC_COLECAO PK CD_COLECAO NM_COLECAO A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 5 / 22
  • 6. Problems Currently architecture is hard and costly to maintain and improve given the relational model nature and systems; innovative initiatives are usually postponed; The data is available online but on the “deep web”; CPDOC’s do not adopt any standard data model or vocab: (1) inhibit interoperability with other open resources; and (2) hardly useful for people outside CPDOC. data ïŹles (audio, videos and images) scattered in diïŹ€erent ïŹle servers, DB only stores metadata and ïŹle paths (loose coupling). A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 6 / 22
  • 7. Some inconsistencies “verbete” is a dictionary entry. “bio cargo” is a position (“cargo”) that the described person had during a speciïŹc time during which he/she carried on a particular assignment (“funcao”). Controled lists but no standards! Double relation between “bio cargo” and “cargo”. A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 7 / 22
  • 8. Inconsistencies are not always straightforward to ïŹx DELETE { ?bioc cpdoc:dbo_DH_VERBETE_BIO_CARGO_CD_CARGO ?cargo } INSERT { graph <http://cpdoc.fgv.br/sys/update1/> { ?bioc cpdoc:dbo_DH_VERBETE_BIO_CARGO_CD_CARGO_FUNC _:funcao . _:funcao rdf:type cpdoc:dbo_DH_CARGO_FUNCAO ; cpdoc:dbo_DH_CARGO_FUNCAO_CD_CARGO ?cargo . } } WHERE { ?bioc cpdoc:dbo_DH_VERBETE_BIO_CARGO_CD_CARGO ?cargo . filter not exists { ?bioc cpdoc:dbo_DH_VERBETE_BIO_CARGO_CD_CARGO_FUNC ?cf . ?cf cpdoc:dbo_DH_CARGO_FUNCAO_CD_CARGO ?cargo . } } A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 8 / 22
  • 9. . . . when we recognize the battle against chaos, mess, and unmastered complexity as one of computing science’s major callings, we must admit that “Beauty is our Business”. (Edsger W. Dijkstra) Some beautiful arguments using mathematical induction. http: // goo. gl/ KQ9j7Q .
  • 10. The Long Run Project Joint project between CPDOC and EMAp (Mathematical School); Enrich the structure (semantics) of CPDOC data; Open and expose CPDOC’s data and architecture making it more maintainable and dynamic; Uniform and integrated data treatment (standards and interlinks between collections). A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 10 / 22
  • 11. Motivations Open Linked Data Initiative Principals; Distributed open source development model/tools (collaborative data maintenance and creation); From data owner to data curator; A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 11 / 22
  • 12. The migration process (1) D2RQ was extracted RDF from relational; (2) enrichment of data semantics (next slides); (3) DHBB entries to simple markdown ïŹles with YAML headers; (4) PHO and Accessus collections are moved to DRMS (standards vocab, access control, faced search, long-term preservation, OAI-PMH support etc. A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 12 / 22
  • 13. The desired architecture A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 13 / 22
  • 14. Improving semantics 1-1 with original relational DB. The connection of technician and interview is parameterized by diïŹ€erent roles, the donator, interviewer and interviewed of an interview are modeled each one in a speciïŹc table. In this case interviewed, interviewer, donator and technician are all people (“ad hoc” modeling). A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 14 / 22
  • 15. Improving semantics prov centric but uses skos, dc, foaf, bio and geo, frbr etc. some classes can be subclasses of standard classes, Interview, some classes can be replaced by standard classes, localidade. A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 15 / 22
  • 16. Conclusions Challenge 1: convince CPDOC researchers to make the transition to data owners to curators. Challenge 2: adapt researchers to new technologies (VC, text editors, scripts?, distributed workïŹ‚ow etc) Model reïŹnements (corrections, transformations by alignments) can be not straightforward. Still a lot to be done. For instance... A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 16 / 22
  • 17. Other Research Opportunities Natural language processing: processing the DHBB entries to discover relations between entries and with other linked data and resources. DHBB for NLP and vice versa! Ontology alignmnent algorithms for (semi-)automated the model transformations. A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 17 / 22
  • 18. Natural Language Processing Manually discovered ≈ 50 links to dbpedia (Presidents of Brazil, presidents of the Senate, political parties etc.) NLP and text mining of DHBB entries: (1) proper names; (2) word sense disambiguation using the openWordnet-PT (lexical resource); and (3) named entity recognition and creation of links between DHBB entries. 133,036 proper names identiïŹed (some few mistakes). Potencially entities (people, locations, organizations etc) Use grammars, lexical resources, formal ontologies, and logical tools to reason about knowledge obtained from processing text in Portuguese (Computational Semantics: KB, KR, and ATP); A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 18 / 22
  • 19. Natural Language Processing A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 19 / 22
  • 20. Audio and Transcriptions Sinal processing to (semi-) automatic produce transcriptions, alignment with already available transcriptions and audio segmentation (interviewer/inverviwed); A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 20 / 22
  • 21. Faces recognition and identiïŹcation Image processing techniques to face recognition in photos collections. A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 21 / 22
  • 22. Obrigado! S: (v) thank, give thanks (express gratitude or show appreciation to) (=> (and (instance ?THANK Thanking) (agent ?THANK ?AGENT) (patient ?THANK ?THING) (destination ?THANK ?PERSON)) (and (instance ?PERSON Human) (or (holdsDuring (WhenFn ?THANK) (wants ?AGENT ?THING)) (holdsDuring (WhenFn ?THANK) (desires ?AGENT ?THING))))) SUMO Ontology, http://www.ontologyportal.org A. Rademaker, S. Higuchi, D. Oliveira (IBM Research and FGV/EMAp, FGV/CPDOC) September 25, 2013 22 / 22