Skip to contents

Accept any interpretable representation of kanji in terms of index numbers, UTF-8 character strings of length 1, UTF-8 codepoints or kanjivec objects and convert it to all or any of these formats.

Usage

convert_kanji(
  key,
  output = c("all", "index", "character", "hexmode", "kanjivec"),
  simplify = TRUE
)

Arguments

key

an atomic vector or list of kanji in any combination of formats.

output

a string describing the desired output.

simplify

logical. Whether to simplify the output to an atomic vector or keep the structure of the original vector. In either case it depends on output whether this is possible.

Value

A vector of the same length as key. If simplify is TRUE, this is an atomic vector for output = "index", "character" or "hexmode", and a list for output = "kanjivec" or "all" a list. If simplify is FALSE, the original structure (atomic or list) kept whenever possible.

Details

Index numbers are in terms of the order in kbase. UTF-8 codepoints are usually of class "hexmode", but character strings starting with "0x" or "0X" are also accepted in the key.

For output = "kanjivec", the GitHub package kanjistat.data has to be available or an error is returned. For output = "all", component kanjivec is set to NA if kanjistat.data is not available.

Examples

convert_kanji(as.hexmode("99ac"))
#> $index
#> [1] 155
#> 
#> $character
#> [1] "馬"
#> 
#> $hexmode
#> [1] "99ac"
#> 
#> $kanjivec
#> Kanjivec representation of 馬 (Unicode 99ac)
#> 10 stroke vector graphics with depth 1 decomposition
#> 
convert_kanji("0x99ac")  # same
#> $index
#> [1] 155
#> 
#> $character
#> [1] "馬"
#> 
#> $hexmode
#> [1] "99ac"
#> 
#> $kanjivec
#> Kanjivec representation of 馬 (Unicode 99ac)
#> 10 stroke vector graphics with depth 1 decomposition
#> 
convert_kanji(500, "character") == kbase$kanji[500]  # TRUE
#> [1] TRUE