admin管理员组

文章数量:1025487

Given the untyped_sequence and int_sequence below:

typedef struct {
    void* data;       // first item
    size_t size;      // number of items
    size_t item_size; // item byte size
} untyped_sequence;

typedef struct {
    int* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} int_sequence;

QUESTION: Is it UB to put them as two union members, initialize an instance of that union using the int_sequence member, then mutating the int data using the untyped_sequence member?

  • If yes - why?
  • If no - why?

GCC, Clang and MSVC give no warnings about this, but that doesn't necessarily mean anything.

Minimal runnable example ():

#include <string.h>
#include <stdio.h>

typedef struct {
    void* data;       // first item
    size_t size;      // number of items
    size_t item_size; // item byte size
} untyped_sequence;

typedef struct {
    int* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} int_sequence;

typedef union {
    int_sequence typed;
    untyped_sequence untyped;
} sequence;

void untyped_zero_first(untyped_sequence untyped) {
    memset(untyped.data, 0, untyped.size * untyped.item_size);
}

int main(void) {
    int ints[4] = {1, 2, 3, 4};
    sequence s = {
        .typed.data      = ints,
        .typed.size      = 4,
        .typed.item_size = sizeof(int)
    };
    untyped_zero_first(s.untyped);
    // prints "0, 0, 0, 0" for GCC, Clang, MSVC - but is ut UB?
    printf("%d, %d, %d, %d\n", ints[0], ints[1], ints[2], ints[3]);
}

Given the untyped_sequence and int_sequence below:

typedef struct {
    void* data;       // first item
    size_t size;      // number of items
    size_t item_size; // item byte size
} untyped_sequence;

typedef struct {
    int* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} int_sequence;

QUESTION: Is it UB to put them as two union members, initialize an instance of that union using the int_sequence member, then mutating the int data using the untyped_sequence member?

  • If yes - why?
  • If no - why?

GCC, Clang and MSVC give no warnings about this, but that doesn't necessarily mean anything.

Minimal runnable example (https://godbolt./z/PT6ahh4qq):

#include <string.h>
#include <stdio.h>

typedef struct {
    void* data;       // first item
    size_t size;      // number of items
    size_t item_size; // item byte size
} untyped_sequence;

typedef struct {
    int* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} int_sequence;

typedef union {
    int_sequence typed;
    untyped_sequence untyped;
} sequence;

void untyped_zero_first(untyped_sequence untyped) {
    memset(untyped.data, 0, untyped.size * untyped.item_size);
}

int main(void) {
    int ints[4] = {1, 2, 3, 4};
    sequence s = {
        .typed.data      = ints,
        .typed.size      = 4,
        .typed.item_size = sizeof(int)
    };
    untyped_zero_first(s.untyped);
    // prints "0, 0, 0, 0" for GCC, Clang, MSVC - but is ut UB?
    printf("%d, %d, %d, %d\n", ints[0], ints[1], ints[2], ints[3]);
}
Share Improve this question asked Nov 17, 2024 at 14:34 Johann GerellJohann Gerell 25.7k11 gold badges76 silver badges126 bronze badges 7
  • for me in this case, I see no value in that union. void * can be just converted to an int * easily. – KamilCuk Commented Nov 17, 2024 at 15:25
  • 2 @Johann, Since a void * and int * may differ in size, code risks UB. Consider void * not fully well defined when int * is smaller. – chux Commented Nov 17, 2024 at 15:26
  • Although such architectures are uncommon, untyped_sequence and int_sequence could differ in size. – chux Commented Nov 17, 2024 at 15:39
  • @KamilCuk: I agree, as far as the example goes. But this is a minimal example of a much, much bigger scenario where it makes a lot of value. – Johann Gerell Commented Nov 17, 2024 at 16:23
  • 1 @JohannGerell The tricky part about UB is that compilers use that excuse to make efficient code. Even if a compilation will emit desired functionality, a new compiler version (or perhaps with more optimizations enabled) may now do undesirable, yet efficient things. Best to avoid UB. – chux Commented Nov 17, 2024 at 18:22
 |  Show 2 more comments

2 Answers 2

Reset to default 4

Is this union pointer member type punning UB in C?

Yes, in that the language spec does not define the behavior (as opposed to explicitly declaring it undefined).

Unlike C++, C does not have a sense of an "active" member of a union. Accessing a different member than was initialized or last stored does not, in and of itself, produce undefined behavior. Since C17, the behavior is not even implementation-defined. You can just do it, which involves (as a note in the spec clarifies) reinterpreting the appropriate part of the stored value according to the type of the accessed member.

But in your particular case, that's not enough. C does not require that the size and representation of type void * be the same as the size and representation of type int *. As far as the spec is concerned, there is no telling, at the point where your example code calls untyped_zero_first(s.untyped), what s.untyped.data points to. It might even be a trap representation if your implementation's void * representation affords those.

In practice, you're unlikely to run into a modern platform in which different object pointer types in fact do have different size or representation, so your code is likely to work as intended, but C does not guarantee that.

  1. The pointers and other fields union punning is implementation defined.

Union Type-Punning Exception (C11, Section 6.5.2.3, Paragraph 3):

  • "A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, to the unit in which it resides), and vice versa."

  • "If the member used to access the contents of a union object is not the same as the member last stored into, the behavior is implementation-defined."

  1. Using the pointers (it may invoke UB)

Effective Type Rule (C11, Section 6.5, Paragraph 7):

"An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a character type."

Strict Aliasing Rule (C11, Section 6.5, Paragraph 7):

  • "An object shall have its stored value accessed only by an lvalue expression that has one of the following types: a type compatible with the effective type of the object..."

Answering in a few words:

  • union type punning is implementation defined
  • using the pointers depends on the referenced objects and pointer types. It may invoke undefined behaviour (UB)

Example invoking and not invoking UB assuming assuming the correctness if the implementation.

typedef struct {
    void* data;       // first item
    size_t size;      // number of items
    size_t item_size; // item byte size
} untyped_sequence;

typedef struct {
    int* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} int_sequence;


typedef struct {
    float* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} float_sequence;


typedef union {
    int_sequence typed;
    untyped_sequence untyped;
    float_sequence floatseq;
} sequence;

void untyped_zero_first(untyped_sequence untyped) {
    memset(untyped.data, 0, untyped.size * untyped.item_size);
}

int main(void) {
    int ints[4] = {1, 2, 3, 4};

    //no UB here
    sequence s = 
    {
        .typed.data      = ints,
        .typed.size      = 4,
        .typed.item_size = sizeof(int)
    };
    untyped_zero_first(s.untyped);
    printf("%d, %d, %d, %d\n", s.typed.data[0], s.typed.data[1], s.typed.data[2], s.typed.data[3]);

    //UB
    printf("%f, %f, %f, %f\n", s.floatseq.data[0], s.floatseq.data[1], s.floatseq.data[2], s.floatseq.data[3]);

}

Given the untyped_sequence and int_sequence below:

typedef struct {
    void* data;       // first item
    size_t size;      // number of items
    size_t item_size; // item byte size
} untyped_sequence;

typedef struct {
    int* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} int_sequence;

QUESTION: Is it UB to put them as two union members, initialize an instance of that union using the int_sequence member, then mutating the int data using the untyped_sequence member?

  • If yes - why?
  • If no - why?

GCC, Clang and MSVC give no warnings about this, but that doesn't necessarily mean anything.

Minimal runnable example ():

#include <string.h>
#include <stdio.h>

typedef struct {
    void* data;       // first item
    size_t size;      // number of items
    size_t item_size; // item byte size
} untyped_sequence;

typedef struct {
    int* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} int_sequence;

typedef union {
    int_sequence typed;
    untyped_sequence untyped;
} sequence;

void untyped_zero_first(untyped_sequence untyped) {
    memset(untyped.data, 0, untyped.size * untyped.item_size);
}

int main(void) {
    int ints[4] = {1, 2, 3, 4};
    sequence s = {
        .typed.data      = ints,
        .typed.size      = 4,
        .typed.item_size = sizeof(int)
    };
    untyped_zero_first(s.untyped);
    // prints "0, 0, 0, 0" for GCC, Clang, MSVC - but is ut UB?
    printf("%d, %d, %d, %d\n", ints[0], ints[1], ints[2], ints[3]);
}

Given the untyped_sequence and int_sequence below:

typedef struct {
    void* data;       // first item
    size_t size;      // number of items
    size_t item_size; // item byte size
} untyped_sequence;

typedef struct {
    int* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} int_sequence;

QUESTION: Is it UB to put them as two union members, initialize an instance of that union using the int_sequence member, then mutating the int data using the untyped_sequence member?

  • If yes - why?
  • If no - why?

GCC, Clang and MSVC give no warnings about this, but that doesn't necessarily mean anything.

Minimal runnable example (https://godbolt./z/PT6ahh4qq):

#include <string.h>
#include <stdio.h>

typedef struct {
    void* data;       // first item
    size_t size;      // number of items
    size_t item_size; // item byte size
} untyped_sequence;

typedef struct {
    int* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} int_sequence;

typedef union {
    int_sequence typed;
    untyped_sequence untyped;
} sequence;

void untyped_zero_first(untyped_sequence untyped) {
    memset(untyped.data, 0, untyped.size * untyped.item_size);
}

int main(void) {
    int ints[4] = {1, 2, 3, 4};
    sequence s = {
        .typed.data      = ints,
        .typed.size      = 4,
        .typed.item_size = sizeof(int)
    };
    untyped_zero_first(s.untyped);
    // prints "0, 0, 0, 0" for GCC, Clang, MSVC - but is ut UB?
    printf("%d, %d, %d, %d\n", ints[0], ints[1], ints[2], ints[3]);
}
Share Improve this question asked Nov 17, 2024 at 14:34 Johann GerellJohann Gerell 25.7k11 gold badges76 silver badges126 bronze badges 7
  • for me in this case, I see no value in that union. void * can be just converted to an int * easily. – KamilCuk Commented Nov 17, 2024 at 15:25
  • 2 @Johann, Since a void * and int * may differ in size, code risks UB. Consider void * not fully well defined when int * is smaller. – chux Commented Nov 17, 2024 at 15:26
  • Although such architectures are uncommon, untyped_sequence and int_sequence could differ in size. – chux Commented Nov 17, 2024 at 15:39
  • @KamilCuk: I agree, as far as the example goes. But this is a minimal example of a much, much bigger scenario where it makes a lot of value. – Johann Gerell Commented Nov 17, 2024 at 16:23
  • 1 @JohannGerell The tricky part about UB is that compilers use that excuse to make efficient code. Even if a compilation will emit desired functionality, a new compiler version (or perhaps with more optimizations enabled) may now do undesirable, yet efficient things. Best to avoid UB. – chux Commented Nov 17, 2024 at 18:22
 |  Show 2 more comments

2 Answers 2

Reset to default 4

Is this union pointer member type punning UB in C?

Yes, in that the language spec does not define the behavior (as opposed to explicitly declaring it undefined).

Unlike C++, C does not have a sense of an "active" member of a union. Accessing a different member than was initialized or last stored does not, in and of itself, produce undefined behavior. Since C17, the behavior is not even implementation-defined. You can just do it, which involves (as a note in the spec clarifies) reinterpreting the appropriate part of the stored value according to the type of the accessed member.

But in your particular case, that's not enough. C does not require that the size and representation of type void * be the same as the size and representation of type int *. As far as the spec is concerned, there is no telling, at the point where your example code calls untyped_zero_first(s.untyped), what s.untyped.data points to. It might even be a trap representation if your implementation's void * representation affords those.

In practice, you're unlikely to run into a modern platform in which different object pointer types in fact do have different size or representation, so your code is likely to work as intended, but C does not guarantee that.

  1. The pointers and other fields union punning is implementation defined.

Union Type-Punning Exception (C11, Section 6.5.2.3, Paragraph 3):

  • "A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, to the unit in which it resides), and vice versa."

  • "If the member used to access the contents of a union object is not the same as the member last stored into, the behavior is implementation-defined."

  1. Using the pointers (it may invoke UB)

Effective Type Rule (C11, Section 6.5, Paragraph 7):

"An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a character type."

Strict Aliasing Rule (C11, Section 6.5, Paragraph 7):

  • "An object shall have its stored value accessed only by an lvalue expression that has one of the following types: a type compatible with the effective type of the object..."

Answering in a few words:

  • union type punning is implementation defined
  • using the pointers depends on the referenced objects and pointer types. It may invoke undefined behaviour (UB)

Example invoking and not invoking UB assuming assuming the correctness if the implementation.

typedef struct {
    void* data;       // first item
    size_t size;      // number of items
    size_t item_size; // item byte size
} untyped_sequence;

typedef struct {
    int* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} int_sequence;


typedef struct {
    float* data;        // first int
    size_t size;      // number of ints
    size_t item_size; // int byte size
} float_sequence;


typedef union {
    int_sequence typed;
    untyped_sequence untyped;
    float_sequence floatseq;
} sequence;

void untyped_zero_first(untyped_sequence untyped) {
    memset(untyped.data, 0, untyped.size * untyped.item_size);
}

int main(void) {
    int ints[4] = {1, 2, 3, 4};

    //no UB here
    sequence s = 
    {
        .typed.data      = ints,
        .typed.size      = 4,
        .typed.item_size = sizeof(int)
    };
    untyped_zero_first(s.untyped);
    printf("%d, %d, %d, %d\n", s.typed.data[0], s.typed.data[1], s.typed.data[2], s.typed.data[3]);

    //UB
    printf("%f, %f, %f, %f\n", s.floatseq.data[0], s.floatseq.data[1], s.floatseq.data[2], s.floatseq.data[3]);

}

本文标签: Is this union pointer member type punning UB in CStack Overflow