Convert between kanji formats — convert

Accept any interpretable representation of kanji in terms of index numbers, UTF-8 character strings of length 1, UTF-8 codepoints or kanjivec objects and convert it to all or any of these formats.

Usage

convert_kanji(
  key,
  output = c("all", "index", "character", "hexmode", "kanjivec"),
  simplify = TRUE
)

Arguments

key: an atomic vector or list of kanji in any combination of formats.
output: a string describing the desired output.
simplify: logical. Whether to simplify the output to an atomic vector or keep the structure of the original vector. In either case it depends on output whether this is possible.

Value

A vector of the same length as key. If simplify is TRUE, this is an atomic vector for output = "index", "character" or "hexmode", and a list for output = "kanjivec" or "all" a list. If simplify is FALSE, the original structure (atomic or list) kept whenever possible.

Details

Index numbers are in terms of the order in kbase. UTF-8 codepoints are usually of class "hexmode", but character strings starting with "0x" or "0X" are also accepted in the key.

For output = "kanjivec", the GitHub package kanjistat.data has to be available or an error is returned. For output = "all", component kanjivec is set to NA if kanjistat.data is not available.

Examples

convert_kanji(as.hexmode("99ac"))
#> $index
#> [1] 155
#> 
#> $character
#> [1] "馬"
#> 
#> $hexmode
#> [1] "99ac"
#> 
#> $kanjivec
#> Kanjivec representation of 馬 (Unicode 99ac)
#> 10 stroke vector graphics with depth 1 decomposition
#> 
convert_kanji("0x99ac")  # same
#> $index
#> [1] 155
#> 
#> $character
#> [1] "馬"
#> 
#> $hexmode
#> [1] "99ac"
#> 
#> $kanjivec
#> Kanjivec representation of 馬 (Unicode 99ac)
#> 10 stroke vector graphics with depth 1 decomposition
#> 
convert_kanji(500, "character") == kbase$kanji[500]  # TRUE
#> [1] TRUE