Accept any interpretable representation of kanji in terms of index numbers,
UTF-8 character strings of length 1, UTF-8 codepoints
or kanjivec objects and convert it to all or any of these
formats.
Usage
convert_kanji(
  key,
  output = c("all", "index", "character", "hexmode", "kanjivec"),
  simplify = TRUE
)Arguments
- key
- an atomic vector or list of kanji in any combination of formats. 
- output
- a string describing the desired output. 
- simplify
- logical. Whether to simplify the output to an atomic vector or keep the structure of the original vector. In either case it depends on output whether this is possible. 
Value
A vector of the same length as key. If simplify is TRUE, this is an
atomic vector for output = "index", "character" or "hexmode", and a list
for output = "kanjivec" or "all" a list. If simplify is FALSE, the original
structure (atomic or list) kept whenever possible.
Details
Index numbers are in terms of the order in kbase. UTF-8 codepoints are
usually of class "hexmode", but character strings starting
with "0x" or "0X" are also accepted in the key.
For output = "kanjivec", the GitHub package kanjistat.data has to be available or
an error is returned. For output = "all", component kanjivec is set to NA if
kanjistat.data is not available.
Examples
convert_kanji(as.hexmode("99ac"))
#> $index
#> [1] 155
#> 
#> $character
#> [1] "馬"
#> 
#> $hexmode
#> [1] "99ac"
#> 
#> $kanjivec
#> Kanjivec representation of 馬 (Unicode 99ac)
#> 10 stroke vector graphics with depth 1 decomposition
#> 
convert_kanji("0x99ac")  # same
#> $index
#> [1] 155
#> 
#> $character
#> [1] "馬"
#> 
#> $hexmode
#> [1] "99ac"
#> 
#> $kanjivec
#> Kanjivec representation of 馬 (Unicode 99ac)
#> 10 stroke vector graphics with depth 1 decomposition
#> 
convert_kanji(500, "character") == kbase$kanji[500]  # TRUE
#> [1] TRUE