Notepad3/ced/ced
2019-03-13 18:04:28 +01:00
..
compact_enc_det + encoding detection fine-tuning 2019-03-13 18:04:28 +01:00
util + encoding detection fine-tuning 2019-03-13 18:04:28 +01:00
_tellenc.hpp Change Rizonesoft Copyright © to year "2019" 2018-12-30 00:28:28 +01:00
autogen.sh + enh: external cmd tool using Google's "Compact Encoding Detection" for encoding analysis 2018-03-14 15:30:45 +01:00
CMakeLists.txt + upd: current corrections for Compact Encoding Detection (CED by Google) 2019-02-01 13:21:15 +01:00
LICENSE + enh: external cmd tool using Google's "Compact Encoding Detection" for encoding analysis 2018-03-14 15:30:45 +01:00
README.md + enh: external cmd tool using Google's "Compact Encoding Detection" for encoding analysis 2018-03-14 15:30:45 +01:00

Introduction

Compact Encoding Detection(CED for short) is a library written in C++ that scans given raw bytes and detect the most likely text encoding.

Basic usage:

#include "compact_enc_det/compact_enc_det.h"

const char* text = "Input text";
bool is_reliable;
int bytes_consumed;

Encoding encoding = CompactEncDet::DetectEncoding(
        text, strlen(text),
        nullptr, nullptr, nullptr,
        UNKNOWN_ENCODING,
        UNKNOWN_LANGUAGE,
        CompactEncDet::WEB_CORPUS,
        false,
        &bytes_consumed,
        &is_reliable);

How to build

You need CMake to build the package. After unzipping the source code , run autogen.sh to build everything automatically. The script also downloads Google Test framework needed to build the unittest.

$ cd compact_enc_det
$ ./autogen.sh
...
$ bin/ced_unittest

On Windows, run cmake . to download the test framework, and generate project files for Visual Studio.

D:\packages\compact_enc_det> cmake .