Try to parse this file with the following Perl script:

#!/usr/bin/env perl

use strict;
use HTML::HTML5::Parser;

use utf8;                            # for the characters in the script.
use open ':encoding(UTF-8)';         # for the file arguments.
binmode STDIN, ':encoding(UTF-8)';   # for stdin.
binmode STDOUT, ':encoding(UTF-8)';  # for stdout.

@ARGV == 1 or die "Usage: $0 <file.html>\n";

my $parser = HTML::HTML5::Parser->new;
my $doc = $parser->parse_file($ARGV[0]);
print "Charset: '", $parser->charset($doc), "'\n";
print $doc->toString();

See Debian bug 750946.

For the test: "↓"