Linux/linux 44f06bafs/udf unicode.c

udf: Fix leak of UTF-16 surrogates into encoded strings

OSTA UDF specification does not mention whether the CS0 charset in case
of two bytes per character encoding should be treated in UTF-16 or
UCS-2. The sample code in the standard does not treat UTF-16 surrogates
in any special way but on systems such as Windows which work in UTF-16
internally, filenames would be treated as being in UTF-16 effectively.
In Linux it is more difficult to handle characters outside of Base
Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
characters only. Just make sure we don't leak UTF-16 surrogates into the
resulting string when loading names from the filesystem for now.

CC: stable at vger.kernel.org # >= v4.6
Reported-by: Mingye Wang <arthur200126 at gmail.com>
Signed-off-by: Jan Kara <jack at suse.cz>
DeltaFile
+6-0fs/udf/unicode.c
+6-01 files

UnifiedSplitRaw