Text to Binary

Convert text strings into binary format.

Word Count Limit: 50

Upload File

Share on Social Media:

This online Text to Binary converter is designed to convert a string of characters into binary representation, where each character is represented by its UTF-8 byte sequence in binary format.

 

How to Use

 

Input Text:

  • Type or paste the text you want to convert into the designated text area.

Upload File:

  • Alternatively, click on the "Upload File" button to select a text file from your device.

Convert to Binary:

  • Click the "Convert to Binary" button to initiate the conversion process.

View Binary Output:

  • The resulting sequence of binary codes, representing each character in its UTF-8 byte sequence, will be displayed in the output area. Codes will be separated by spaces for clarity.

Copy to Clipboard:

  • Click the "Copy to Clipboard" button to easily copy the binary codes. 

Save as text file:

  • Click the "Save as TXT" button to download the binary codes as a text file to your device.

 

Binary Notation

 

A byte is a unit of digital information that consists of 8 bits. Each bit can have a value of 0 or 1, and when combined in groups of eight, they can represent a range of values from 0 to 255. In binary notation, each digit's position (from 0 to 7) represents a power of 2 (0, 1, 2, 4, 8, 16, 32, 64). The combination of these values in a byte provides a unique representation for each of the 256 possible combinations.

 

Example:

Let's consider the ASCII character 'A' with code point 65 (decimal).

65 (decimal) (or 41 hexadecimal) is represented as 01000001 in binary notation.

Here, we have ‘1’ in the position ‘’0’ and ‘6’, hence the value of this binary number can be calculated as 2^6 + 2^0 = 64 + 1 = 65. 

 

Multibyte UTF-8 Character Encoding

 

UTF-8 is a variable-width character encoding that can represent every character in the Unicode character set. In UTF-8, each character is represented by one to four bytes. The encoding is designed to be backward-compatible with ASCII, and characters in the ASCII set are represented by a single byte in UTF-8. Other characters, including those in different languages and special symbols, are represented by multiple bytes. The format of the encoding is as follows:

Single-Byte Character (ASCII characters 0-127):

  • For characters in the ASCII character set (0-127), UTF-8 uses a single byte.
  • The value range for the first byte is 0 to 127.
  • The leftmost bit is always 0, and the remaining 7 bits represent the ASCII character: 
    0xxxxxxx.

Two-Byte Character (128-2047):

  • Characters with code points beyond the ASCII range use two bytes.
  • The value range for the first byte is 192 to 223.
  • The leading bits "110" indicate a two-byte character, and the following "10" bits mark continuation bytes: 
    110xxxxx 10xxxxxx.

Three-Byte Character (2048-65535):

  • Characters with higher code points use three bytes.
  • The value range for the first byte is 224 to 239.
  • The leading bits "1110" indicate a three-byte character, and the following "10" bits mark continuation bytes: 
    1110xxxx 10xxxxxx 10xxxxxx.

Four-Byte Character (65536-1114111):

  • Characters with very high code points use four bytes.
  • The value range for the first byte is 240 to 247.
  • The leading bits "11110" indicate a four-byte character, and the following "10" bits mark continuation bytes: 
    11110xxx 10xxxxxx 10xxxxxx 10xxxxxx.

Note: the '10' in continuation bytes marks them as such. The remaining bits, marked with  'x', store the actual code point.

 

Example:

Let's consider the Unicode character U+1F44D (thumbs up sign 👍).

Since 1F44D (hexadecimal) or 128077 (decimal) is in the range 65536-1114111, it will be encoded using four bytes:

11110000 10011111 10010001 10001101

Here, the first byte starts with "11110" indicating a four-byte character, and the following bytes start with "10" indicating continuation bytes.  Using the remaining bits to form a  binary number, we obtain  00001 11110100 01001101, which is a binary representation of 1F44D.