Accept any interpretable representation of kanji in terms of index numbers,
UTF-8 character strings of length 1, UTF-8 codepoints
or kanjivec
objects and convert it to all or any of these
formats.
Usage
convert_kanji(
key,
output = c("all", "index", "character", "hexmode", "kanjivec"),
simplify = TRUE
)
Arguments
- key
an atomic vector or list of kanji in any combination of formats.
- output
a string describing the desired output.
- simplify
logical. Whether to simplify the output to an atomic vector or keep the structure of the original vector. In either case it depends on output whether this is possible.
Value
A vector of the same length as key. If simplify
is TRUE
, this is an
atomic vector for output = "index", "character" or "hexmode", and a list
for output = "kanjivec" or "all" a list. If simplify
is FALSE
, the original
structure (atomic or list) kept whenever possible.
Details
Index numbers are in terms of the order in kbase
. UTF-8 codepoints are
usually of class "hexmode", but character strings starting
with "0x" or "0X" are also accepted in the key
.
For output = "kanjivec"
, the GitHub package kanjistat.data has to be available or
an error is returned. For output = "all"
, component kanjivec is set to NA if
kanjistat.data is not available.
Examples
convert_kanji(as.hexmode("99ac"))
#> $index
#> [1] 155
#>
#> $character
#> [1] "馬"
#>
#> $hexmode
#> [1] "99ac"
#>
#> $kanjivec
#> Kanjivec representation of 馬 (Unicode 99ac)
#> 10 stroke vector graphics with depth 1 decomposition
#>
convert_kanji("0x99ac") # same
#> $index
#> [1] 155
#>
#> $character
#> [1] "馬"
#>
#> $hexmode
#> [1] "99ac"
#>
#> $kanjivec
#> Kanjivec representation of 馬 (Unicode 99ac)
#> 10 stroke vector graphics with depth 1 decomposition
#>
convert_kanji(500, "character") == kbase$kanji[500] # TRUE
#> [1] TRUE