Package 'ojsr'

Title: Crawler and Data Scraper for Open Journal System ('OJS')
Description: Crawler for 'OJS' pages and scraper for meta-data from articles. You can crawl 'OJS' archives, issues, articles, galleys, and search results. You can scrape articles metadata from their head tag in html, or from Open Archives Initiative ('OAI') records. Most of these functions rely on 'OJS' routing conventions (<https://docs.pkp.sfu.ca/dev/documentation/en/architecture-routes>).
Authors: Gaston Becerra [aut, cre]
Maintainer: Gaston Becerra <[email protected]>
License: GPL-3
Version: 0.1.5
Built: 2025-03-13 05:17:56 UTC
Source: https://github.com/gastonbecerra/ojsr

Help Index


Scraping articles URLs from the ToC of OJS issues

Description

Takes a vector of OJS (issue) URLs and scrapes the links to articles from the issues table of content

Usage

get_articles_from_issue(input_url, verbose = FALSE)

Arguments

input_url

Character vector.

verbose

Logical.

Value

A long-format dataframe with the url you provided (input_url) and the articles url scrapped (output_url)

Examples

issue <- 'https://revistas.ucn.cl/index.php/saludysociedad/issue/view/65'
articles <- ojsr::get_articles_from_issue(input_url = issue)

Scraping galleys URLs from OJS articles

Description

Takes a vector of OJS URLs and scrapes all the galleys URLs from the article view

Usage

get_galleys_from_article(input_url, verbose = FALSE)

Arguments

input_url

Character vector.

verbose

Logical.

Value

A long-format dataframe with the url you provided (input_url), the articles url scrapped (output_url), the format of the galley (format), and the url that forces download of the galley (download_url)

Examples

article <- 'https://revistapsicologia.uchile.cl/index.php/RDP/article/view/55657'
galleys <- ojsr::get_galleys_from_article(input_url = article)

Scraping metadata from the OJS articles HTML

Description

Takes a vector of OJS URLs and scrapes all metadata written in HTML from the article view

Usage

get_html_meta_from_article(input_url, verbose = FALSE)

Arguments

input_url

Character vector.

verbose

Logical.

Value

A long-format dataframe with the url you provided (input_url), the name of the metadata (meta_data_name), the content of the metadata (meta_data_content), the standard in which the content is annotated (meta_data_scheme), and the language in which the metadata was entered (meta_data_xmllang)

Examples

article <- 'https://dspace.palermo.edu/ojs/index.php/psicodebate/article/view/516/311'
metadata <- ojsr::get_html_meta_from_article(article)

Scraping issues’ URLs from the OJS issues archive

Description

Takes a vector of OJS URLs and scrapes the issues URLs from the issue archive.

Usage

get_issues_from_archive(input_url, verbose = FALSE)

Arguments

input_url

Character vector.

verbose

Logical.

Value

A long-format dataframe with the url you provided (input_url) and the url of issues found (output_url)

Examples

journal <- 'https://dspace.palermo.edu/ojs/index.php/psicodebate/issue/archive'
issues <- ojsr::get_issues_from_archive(input_url = journal)

Retrieving OAI records for OJS articles

Description

This functions access OAI records (within OJS) for any article for which you provided an URL.

Usage

get_oai_meta_from_article(input_url, verbose = FALSE)

Arguments

input_url

Character vector.

verbose

Logical.

Details

Several limitations are in place. Please refer to vignette.

Value

A long-format dataframe with the url you provided (input_url), the name of the metadata (meta_data_name), and the content of the metadata (meta_data_content).

Examples

article <- 'https://dspace.palermo.edu/ojs/index.php/psicodebate/article/view/516/311'
metadata_oai <- ojsr::get_oai_meta_from_article(input_url = article)

Parses urls against OJS routing conventions and retrieves the base url

Description

Takes a vector of urls and parses them according to OJS routing conventions, then retrieves OJS base url.

Usage

parse_base_url(input_url)

Arguments

input_url

Character vector.

Value

A vector of the same length of your input.

Examples

mix_links <- c(
  'https://dspace.palermo.edu/ojs/index.php/psicodebate/issue/archive',
  'https://publicaciones.sociales.uba.ar/index.php/psicologiasocial/article/view/2903'
)
base_url <- ojsr::parse_base_url(input_url = mix_links)

Parses urls against OJS routing conventions and retrieves the OAI url

Description

Takes a vector of urls and parses them according to OJS routing conventions, then retrieves OAI entry url.

Usage

parse_oai_url(input_url)

Arguments

input_url

Character vector.

Value

A vector of the same length of your input.

Examples

mix_links <- c(
  'https://dspace.palermo.edu/ojs/index.php/psicodebate/issue/archive',
  'https://publicaciones.sociales.uba.ar/index.php/psicologiasocial/article/view/2903'
)
oai_url <- ojsr::parse_oai_url(input_url = mix_links)