393. UTF-8 Validation

Dare2Solve

Dare2Solve

393. UTF-8 Validation
SAMSUNG 49-Inch Odyssey G9
SAMSUNG 49-Inch Odyssey G9
Because earth is not flat

Description

The problem is to determine whether a list of integers represents a valid UTF-8 encoding. Each integer represents a byte, and we need to verify that they form a valid sequence according to UTF-8 encoding rules.

Intuition

In UTF-8, a byte sequence can be 1 to 4 bytes long, starting with a leading byte followed by continuation bytes. By counting the leading 1's in the binary representation of each byte, we can determine if the sequence is valid.

Approach

  1. Iterate through each byte in the array:
    • Count the number of leading 1's.
    • If there are more than 4 leading 1's, return false.
    • If the count indicates the start of a multi-byte sequence, store the count of remaining continuation bytes.
    • If the byte is expected to be a continuation byte but doesn't follow the format, return false.
  2. After processing all bytes, ensure no unprocessed continuation bytes remain.

Complexity

Time Complexity:

O(n), where n is the number of bytes in the input array.

Space Complexity:

O(1), as we use constant extra space.

Code

C++

class Solution {
public:
    bool validUtf8(vector<int>& data) {
        int n = data.size();
        int temp, count, storeCount = 0;

        for (int i = 0; i < n; i++) {
            temp = 128; // 0b10000000
            count = 0;

            // Count leading ones in current byte
            while ((data[i] & temp) != 0) {
                count++;
                temp >>= 1;
            }

            if (count > 4) {
                return false;
            } else if (count > 1) {
                if (storeCount != 0) {
                    return false;
                }
                storeCount = count - 1;
            } else if (count == 0) {
                if (storeCount != 0) {
                    return false;
                }
            } else {
                if (storeCount >= 1) {
                    storeCount--;
                } else {
                    return false;
                }
            }
        }

        return storeCount == 0;
    }
};

Python

class Solution:
    def validUtf8(self, data: List[int]) -> bool:
        storeCount = 0

        for byte in data:
            temp = 128  # 0b10000000
            count = 0

            # Count leading ones in current byte
            while byte & temp != 0:
                count += 1
                temp >>= 1

            if count > 4:
                return False
            elif count > 1:
                if storeCount != 0:
                    return False
                storeCount = count - 1
            elif count == 0:
                if storeCount != 0:
                    return False
            else:
                if storeCount >= 1:
                    storeCount -= 1
                else:
                    return False

        return storeCount == 0

Java

class Solution {
    public boolean validUtf8(int[] data) {
        int n = data.length;
        int temp, count, storeCount = 0;

        for (int i = 0; i < n; i++) {
            temp = 128; // 0b10000000
            count = 0;

            // Count leading ones in current byte
            while ((data[i] & temp) != 0) {
                count++;
                temp >>= 1;
            }

            if (count > 4) {
                return false;
            } else if (count > 1) {
                if (storeCount != 0) {
                    return false;
                }
                storeCount = count - 1;
            } else if (count == 0) {
                if (storeCount != 0) {
                    return false;
                }
            } else {
                if (storeCount >= 1) {
                    storeCount--;
                } else {
                    return false;
                }
            }
        }

        return storeCount == 0;
    }
}

JavaScript

var validUtf8 = function (data) {

    let n = data.length
    let temp = 128
    let count = 0
    let storeCount = 0

    for (let i = 0; i < n; i++) {
        temp = 128
        count = 0
        while ((data[i] & temp) != 0) {
            count++
            temp >>>= 1
        }
        if (count > 4) {
            return false
        } else if (count > 1) {
            if (storeCount != 0) {
                return false
            }
            storeCount = count - 1
        } else if (count == 0) {
            if (storeCount != 0) {
                return false
            }
        } else {
            if (storeCount >= 1) {
                storeCount--
            } else {
                return false
            }
        }
    }
    if (storeCount > 0) {
        return false
    }
    return true
};