OwlCyberSecurity - MANAGER
Edit File: universaldetector.cpython-36.opt-1.pyc
3 `9Y�0������������������@���s����d�Z�ddlZddlZddlZddlmZ�ddlmZmZm Z �ddl mZ�ddlm Z �ddlmZ�dd lmZ�G�d d��de�ZdS�)a�� Module containing the UniversalDetector detector class, which is the primary class a user of ``chardet`` should use. :author: Mark Pilgrim (initial port to Python) :author: Shy Shalom (original C code) :author: Dan Blanchard (major refactoring for 3.0) :author: Ian Cordasco �����N����)�CharSetGroupProber)� InputState�LanguageFilter�ProbingState)�EscCharSetProber)�Latin1Prober)�MBCSGroupProber)�SBCSGroupProberc������������ ���@���sn���e�Zd�ZdZdZejd�Zejd�Zejd�Z dddd d ddd d�Z ejfdd�Z dd��Zdd��Zdd��ZdS�)�UniversalDetectoraq�� The ``UniversalDetector`` class underlies the ``chardet.detect`` function and coordinates all of the different charset probers. To get a ``dict`` containing an encoding and its confidence, you can simply run: .. code:: u = UniversalDetector() u.feed(some_bytes) u.close() detected = u.result g�������?s���[�-�]s���(|~{)s���[�-�]zWindows-1252zWindows-1250zWindows-1251zWindows-1256zWindows-1253zWindows-1255zWindows-1254zWindows-1257)z iso-8859-1z iso-8859-2z iso-8859-5z iso-8859-6z iso-8859-7z iso-8859-8z iso-8859-9ziso-8859-13c�������������C���sN���d�|�_�g�|�_d�|�_d�|�_d�|�_d�|�_d�|�_||�_tj t �|�_d�|�_|�j ���d�S�)N)�_esc_charset_prober�_charset_probers�result�done� _got_data�_input_state� _last_char�lang_filter�loggingZ getLogger�__name__�logger�_has_win_bytes�reset)�selfr�����r����'/usr/lib/python3.6/universaldetector.py�__init__Q���s����zUniversalDetector.__init__c�������������C���sZ���dddd�|�_�d|�_d|�_d|�_tj|�_d|�_|�jr>|�jj ���x|�j D�]}|j ���qFW�dS�)z� Reset the UniversalDetector and all of its probers back to their initial states. This is called by ``__init__``, so you only need to call this directly in between analyses of different documents. Ng��������)�encoding� confidence�languageF�����)r���r���r���r���r���� PURE_ASCIIr���r���r���r���r ���)r����proberr���r���r���r���^���s���� zUniversalDetector.resetc�������������C���s>��|�j�r dS�t|�sdS�t|t�s(t|�}|�js�|jtj�rJdddd�|�_nv|jtj tj f�rldddd�|�_nT|jd�r�dddd�|�_n:|jd �r�d ddd�|�_n |jtjtjf�r�dddd�|�_d|�_|�jd �dk r�d|�_�dS�|�j tjk�r.|�jj|��rtj|�_ n*|�j tjk�r.|�jj|�j|���r.tj|�_ |dd��|�_|�j tjk�r�|�j�s^t|�j�|�_|�jj|�tjk�r:|�jj|�jj��|�jjd�|�_d|�_�n�|�j tjk�r:|�j�s�t |�j�g|�_|�jt!j"@��r�|�jj#t$����|�jj#t%����x@|�jD�]6}|j|�tjk�r�|j|j��|jd�|�_d|�_�P��q�W�|�j&j|��r:d|�_'dS�)a��� Takes a chunk of a document and feeds it through all of the relevant charset probers. After calling ``feed``, you can check the value of the ``done`` attribute to see if you need to continue feeding the ``UniversalDetector`` more data, or if it has made a prediction (in the ``result`` attribute). .. note:: You should always call ``close`` when you're done feeding in your document if ``done`` is not already ``True``. Nz UTF-8-SIGg�������?��)r���r���r���zUTF-32s�������zX-ISO-10646-UCS-4-3412s�������zX-ISO-10646-UCS-4-2143zUTF-16Tr���r������)(r����len� isinstance� bytearrayr���� startswith�codecs�BOM_UTF8r����BOM_UTF32_LE�BOM_UTF32_BE�BOM_LE�BOM_BEr���r���r!����HIGH_BYTE_DETECTOR�search� HIGH_BYTE�ESC_DETECTORr���Z ESC_ASCIIr���r���r����feedr���ZFOUND_IT�charset_name�get_confidencer���r ���r ���r���ZNON_CJK�appendr ���r����WIN_BYTE_DETECTORr���)r���Zbyte_strr"���r���r���r���r3���o���s|���� zUniversalDetector.feedc������� ������C���s���|�j�r|�jS�d|�_�|�js&|�jjd��n�|�jtjkrBdddd�|�_n�|�jtjkr�d}d}d}x,|�j D�]"}|slqb|j ��}||krb|}|}qbW�|r�||�jkr�|j}|jj ��}|j ��}|jd �r�|�jr�|�jj||�}|||jd�|�_|�jj��tjk�rz|�jd �dk�rz|�jjd��xn|�j D�]d}|�s �qt|t��rZxF|jD�] }|�jjd|j|j|j �����q4W�n|�jjd|j|j|j �����qW�|�jS�) z� Stop analyzing the current document and come up with a final prediction. :returns: The ``result`` attribute, a ``dict`` with the keys `encoding`, `confidence`, and `language`. Tzno data received!�asciig�������?r#���)r���r���r���Ng��������ziso-8859r���z no probers hit minimum thresholdz%s %s confidence = %s)r���r���r���r����debugr���r���r!���r1���r ���r5����MINIMUM_THRESHOLDr4����lowerr(���r����ISO_WIN_MAP�getr���ZgetEffectiveLevelr����DEBUGr&���r���Zprobers) r���Zprober_confidenceZmax_prober_confidenceZ max_proberr"���r4���Zlower_charset_namer���Zgroup_proberr���r���r����close����s`���� zUniversalDetector.closeN)r���� __module__�__qualname__�__doc__r:����re�compiler/���r2���r7���r<���r���ZALLr���r���r3���r?���r���r���r���r���r���3���s"��� mr���)rB���r)���r���rC���Zcharsetgroupproberr���Zenumsr���r���r���Z escproberr���Zlatin1proberr���Zmbcsgroupproberr ���Zsbcsgroupproberr ����objectr���r���r���r���r����<module>$���s���