admin管理员组

文章数量:1026989

I am collecting data from each month on measured parts into a single file for the whole year. When using a "query from folder" I am able to get all the data together, formatted, and sorted with one exception. Every part has an "A" and a "B" version. Unfortunately, due to production order, sometimes the "B" part is measured before the "A" part. In this case I would not want to sort by time as the order would then go, for example, A,B,A,B,B,A,A,B,A,B. I want it to always place the "A" part before the "B" part. Parts are measured twice per day so I cannot sort by day and then part letter because it would then go, for example, A,A,B,B,A,A,B,B. How can I sort the data such that it goes by day, then time, then overwrites time to keep the A,B,A,B pattern?

To further complicate things, sometimes the data collector messes up and mislabels one of the parts. In this case it would sort as, for example, A,B,A,B,A,A,A,B,A,B. How can I find this error and correct it automatically before pasting the consolidated data into a table.

(Data has been over simplified for confidentiality reasons)

You can see on May 2nd in the morning A/B are reversed because B data was taken before A data. Sorting the data by time messed up the order.

You can see on April 2nd in the morning (1PM is morning shift) there are two A parts when one of them should be B (for this error we can assume they were taken in the order of "A" before "B" so the time of data collection applies).

I am new to using queries and honestly struggling hard on this one. Please help me not only solve this problem but also understand it.

Here are text versions of the data:

Apr

Date Time Letter Data
4/1/2024 7:25:08 AM A 0.7
4/1/2024 7:30:56 AM B 0.5
4/1/2024 8:32:51 PM A 0.6
4/1/2024 8:36:44 PM B 0.5
4/2/2024 1:32:59 PM A 1
4/2/2024 1:38:36 PM A 0.5
4/2/2024 8:46:11 PM A 0.7
4/2/2024 8:51:31 PM B 0.7

I am collecting data from each month on measured parts into a single file for the whole year. When using a "query from folder" I am able to get all the data together, formatted, and sorted with one exception. Every part has an "A" and a "B" version. Unfortunately, due to production order, sometimes the "B" part is measured before the "A" part. In this case I would not want to sort by time as the order would then go, for example, A,B,A,B,B,A,A,B,A,B. I want it to always place the "A" part before the "B" part. Parts are measured twice per day so I cannot sort by day and then part letter because it would then go, for example, A,A,B,B,A,A,B,B. How can I sort the data such that it goes by day, then time, then overwrites time to keep the A,B,A,B pattern?

To further complicate things, sometimes the data collector messes up and mislabels one of the parts. In this case it would sort as, for example, A,B,A,B,A,A,A,B,A,B. How can I find this error and correct it automatically before pasting the consolidated data into a table.

(Data has been over simplified for confidentiality reasons)

You can see on May 2nd in the morning A/B are reversed because B data was taken before A data. Sorting the data by time messed up the order.

You can see on April 2nd in the morning (1PM is morning shift) there are two A parts when one of them should be B (for this error we can assume they were taken in the order of "A" before "B" so the time of data collection applies).

I am new to using queries and honestly struggling hard on this one. Please help me not only solve this problem but also understand it.

Here are text versions of the data:

Apr

Date Time Letter Data
4/1/2024 7:25:08 AM A 0.7
4/1/2024 7:30:56 AM B 0.5
4/1/2024 8:32:51 PM A 0.6
4/1/2024 8:36:44 PM B 0.5
4/2/2024 1:32:59 PM A 1
4/2/2024 1:38:36 PM A 0.5
4/2/2024 8:46:11 PM A 0.7
4/2/2024 8:51:31 PM B 0.7

May

Date Time Letter Data
5/1/2024 1:35:12 PM A 0.6
5/1/2024 1:39:05 PM B 0.4
5/1/2024 6:07:11 PM A 0.8
5/1/2024 6:10:43 PM B 0.5
5/2/2024 10:59:32 AM A 0.8
5/2/2024 8:42:16 AM B 0.1
5/2/2024 6:15:07 PM A 0.4
5/2/2024 6:18:40 PM B 0.2

YTD (Current output)

Date Time Letter Data
4/1/2024 7:25:08 AM A 0.7
4/1/2024 7:30:56 AM B 0.5
4/1/2024 8:32:51 PM A 0.6
4/1/2024 8:36:44 PM B 0.5
4/2/2024 1:32:59 PM A 1
4/2/2024 1:38:36 PM A 0.5
4/2/2024 8:46:11 PM A 0.7
4/2/2024 8:51:31 PM B 0.7
5/1/2024 1:35:12 PM A 0.6
5/1/2024 1:39:05 PM B 0.4
5/1/2024 6:07:11 PM A 0.8
5/1/2024 6:10:43 PM B 0.5
5/2/2024 8:42:16 AM B 0.1
5/2/2024 10:59:32 AM A 0.8
5/2/2024 6:15:07 PM A 0.4
5/2/2024 6:18:40 PM B 0.2

YTD (Desired output)

Date Time Letter Data
4/1/2024 7:25:08 AM A 0.7
4/1/2024 7:30:56 AM B 0.5
4/1/2024 8:32:51 PM A 0.6
4/1/2024 8:36:44 PM B 0.5
4/2/2024 1:32:59 PM A 1
4/2/2024 1:38:36 PM B 0.5
4/2/2024 8:46:11 PM A 0.7
4/2/2024 8:51:31 PM B 0.7
5/1/2024 1:35:12 PM A 0.6
5/1/2024 1:39:05 PM B 0.4
5/1/2024 6:07:11 PM A 0.8
5/1/2024 6:10:43 PM B 0.5
5/2/2024 10:59:32 AM A 0.8
5/2/2024 8:42:16 AM B 0.1
5/2/2024 6:15:07 PM A 0.4
5/2/2024 6:18:40 PM B 0.2

I used a simple power query as seen here and additionally, in order, changed all the data types to the correct type, sorted by date, sorted by time, removed source name, and removed duplicates.

The reason I cannot rely on file names to sort the data and keep it as it is within the file is because I am pulling data from sheets that all have the same name but are in their own respective monthly folders. Folders sort alphabetically so the order of the months would be wrong if I didn't manually sort it.

Share Improve this question edited Dec 13, 2024 at 20:41 davidebacci 30k4 gold badges17 silver badges47 bronze badges asked Dec 12, 2024 at 18:52 LightOfTheNightLightOfTheNight 255 bronze badges 7
  • Maybe take one slice of data and tell us what the output would look like after you adjust it for all potential errors, and the method you used to adjust each of those, so we can reproduce it – horseyride Commented Dec 12, 2024 at 19:09
  • @horseyride Thank you for the suggestion. I have simplified it as much as I think I can without losing the point – LightOfTheNight Commented Dec 12, 2024 at 20:20
  • Ok good luck with that – horseyride Commented Dec 12, 2024 at 20:30
  • How can you tell the difference between A and B versions where B was produced first, vs a labelling errors by the data collector? – Ron Rosenfeld Commented Dec 12, 2024 at 22:05
  • could you pls provide the expected output? there are three A in 4/2. How to order these three A? – Ryan Commented Dec 13, 2024 at 0:34
 |  Show 2 more comments

2 Answers 2

Reset to default 1

Try this:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Time", type number}, {"Letter", type text}, {"Data", type number}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1,  Int64.Type),
    Page = Table.TransformColumns( #"Added Index", {{"Index", each if Number.Mod(_,2) = 0 then _ else _-1}}),
    #"Grouped Rows" = Table.Group(Page, {"Index"}, {{"All", each _, type table [Date=nullable date, Time=nullable number, Letter=nullable text, Data=nullable number, Index=number]}}),
    Logic = Table.TransformColumns( #"Grouped Rows",{{"All", (x)=> let
        rec1 = x{0},
        rec2 = x{1},
        same = if rec1[Letter] = rec2[Letter] then true else false,
        logic1 = if same and rec1[Time] < rec2[Time] then  {Record.Combine({rec1, [Letter="A", Index=rec1[Index]+1]}),Record.Combine({rec2, [Letter="B", Index=rec2[Index]+2]})} else 
        if same and rec1[Time] > rec2[Time] then {Record.Combine({rec1, [Letter="B",Index=rec1[Index]+1]}),Record.Combine({rec2, [Letter="A", Index=rec2[Index]+2]})} else 
        if not same and rec1[Letter] = "A" then {Record.Combine({rec1, [Index=rec1[Index]+1]}),Record.Combine({rec2, [ Index=rec2[Index]+2]})} else 
        if not same and rec1[Letter] = "B" then {Record.Combine({rec1, [Index=rec1[Index]+2]}),Record.Combine({rec2, [ Index=rec2[Index]+1]})} else null
        in Table.FromRecords( logic1)
        }}),
    #"Removed Columns" = Table.RemoveColumns(Logic,{"Index"}),
    #"Expanded All" = Table.ExpandTableColumn(#"Removed Columns", "All", {"Date", "Time", "Letter", "Data", "Index"}, {"Date", "Time", "Letter", "Data", "Index"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Expanded All",{{"Date", type date}, {"Time", type number}, {"Letter", type text}, {"Data", type number}, {"Index", Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Index", Order.Ascending}})
in
    #"Sorted Rows"

This code seems to work with your data. Could be modified depending on your actual setup.

  • Assumes each month's table is read in as a separate query, and the data types are properly set.

  • Add a blank query

  • Paste the code below into the Advanced Editor

  • *Rename that query according to the Code Comments

//Rename: fnFixLetter

(tbl as table)=>

let 

//Sort by date and time, ascending
   #"Sorted Rows" = Table.Sort(tbl,{{"Date", Order.Ascending}, {"Time", Order.Ascending}}),

//ASSUMPTION: there are NO missing entries, so grouping by pairs after sorting by date and time will always return the relevant rows
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1, Int64.Type),
    #"Inserted Integer-Division" = Table.AddColumn(#"Added Index", "Integer-Division", each Number.IntegerDivide([Index], 2), Int64.Type),

//Remove unneeded Index Column
    #"Removed Columns" = Table.RemoveColumns(#"Inserted Integer-Division",{"Index"}),

//Group by Integer-Divide (the pairs)
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"Integer-Division"}, {
        
    //Do the magic
        {"all", (t)=> let 
            ltrs = t[Letter],

            //if the letters are identical then they should {"A","B"}
            order = if List.Count(List.Distinct(ltrs)) = 1 then {"A","B"} else ltrs,

            //Replace the Letters column with either {"A","B"} or leave what was there
            tbl= Table.FromColumns(
                    Table.ToColumns(
                        Table.RemoveColumns(t,{"Letter","Integer-Division"}))
                     & {order},{"Date","Time","Data","Letter"}),

            //resort each pair by Letter
            reSort = Table.Sort(tbl,{"Letter", Order.Ascending})
                    
            in reSort,
            type table[Date=nullable date, Time=nullable time,Data=nullable number, Letter=nullable text] 

        }}),

//Cleanup
    #"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Integer-Division"}),
    #"Expanded all" = Table.ExpandTableColumn(#"Removed Columns1", "all", {"Date", "Time", "Data", "Letter"}),
    #"Reorder Columns" = Table.ReorderColumns(#"Expanded all", {"Date","Time","Letter","Data"})
in 
        #"Reorder Columns"

Now, in a new query, combine all of the existing tables

let 

//List of all the tables to be combined, in order
    tbls = {April, May},

    append = List.Accumulate(
        tbls,
        #table({},{}),
        (s,c)=> Table.Combine({s,fnFixLetter(c)})
    )
in 
    append

Data Tables

Results

I am collecting data from each month on measured parts into a single file for the whole year. When using a "query from folder" I am able to get all the data together, formatted, and sorted with one exception. Every part has an "A" and a "B" version. Unfortunately, due to production order, sometimes the "B" part is measured before the "A" part. In this case I would not want to sort by time as the order would then go, for example, A,B,A,B,B,A,A,B,A,B. I want it to always place the "A" part before the "B" part. Parts are measured twice per day so I cannot sort by day and then part letter because it would then go, for example, A,A,B,B,A,A,B,B. How can I sort the data such that it goes by day, then time, then overwrites time to keep the A,B,A,B pattern?

To further complicate things, sometimes the data collector messes up and mislabels one of the parts. In this case it would sort as, for example, A,B,A,B,A,A,A,B,A,B. How can I find this error and correct it automatically before pasting the consolidated data into a table.

(Data has been over simplified for confidentiality reasons)

You can see on May 2nd in the morning A/B are reversed because B data was taken before A data. Sorting the data by time messed up the order.

You can see on April 2nd in the morning (1PM is morning shift) there are two A parts when one of them should be B (for this error we can assume they were taken in the order of "A" before "B" so the time of data collection applies).

I am new to using queries and honestly struggling hard on this one. Please help me not only solve this problem but also understand it.

Here are text versions of the data:

Apr

Date Time Letter Data
4/1/2024 7:25:08 AM A 0.7
4/1/2024 7:30:56 AM B 0.5
4/1/2024 8:32:51 PM A 0.6
4/1/2024 8:36:44 PM B 0.5
4/2/2024 1:32:59 PM A 1
4/2/2024 1:38:36 PM A 0.5
4/2/2024 8:46:11 PM A 0.7
4/2/2024 8:51:31 PM B 0.7

I am collecting data from each month on measured parts into a single file for the whole year. When using a "query from folder" I am able to get all the data together, formatted, and sorted with one exception. Every part has an "A" and a "B" version. Unfortunately, due to production order, sometimes the "B" part is measured before the "A" part. In this case I would not want to sort by time as the order would then go, for example, A,B,A,B,B,A,A,B,A,B. I want it to always place the "A" part before the "B" part. Parts are measured twice per day so I cannot sort by day and then part letter because it would then go, for example, A,A,B,B,A,A,B,B. How can I sort the data such that it goes by day, then time, then overwrites time to keep the A,B,A,B pattern?

To further complicate things, sometimes the data collector messes up and mislabels one of the parts. In this case it would sort as, for example, A,B,A,B,A,A,A,B,A,B. How can I find this error and correct it automatically before pasting the consolidated data into a table.

(Data has been over simplified for confidentiality reasons)

You can see on May 2nd in the morning A/B are reversed because B data was taken before A data. Sorting the data by time messed up the order.

You can see on April 2nd in the morning (1PM is morning shift) there are two A parts when one of them should be B (for this error we can assume they were taken in the order of "A" before "B" so the time of data collection applies).

I am new to using queries and honestly struggling hard on this one. Please help me not only solve this problem but also understand it.

Here are text versions of the data:

Apr

Date Time Letter Data
4/1/2024 7:25:08 AM A 0.7
4/1/2024 7:30:56 AM B 0.5
4/1/2024 8:32:51 PM A 0.6
4/1/2024 8:36:44 PM B 0.5
4/2/2024 1:32:59 PM A 1
4/2/2024 1:38:36 PM A 0.5
4/2/2024 8:46:11 PM A 0.7
4/2/2024 8:51:31 PM B 0.7

May

Date Time Letter Data
5/1/2024 1:35:12 PM A 0.6
5/1/2024 1:39:05 PM B 0.4
5/1/2024 6:07:11 PM A 0.8
5/1/2024 6:10:43 PM B 0.5
5/2/2024 10:59:32 AM A 0.8
5/2/2024 8:42:16 AM B 0.1
5/2/2024 6:15:07 PM A 0.4
5/2/2024 6:18:40 PM B 0.2

YTD (Current output)

Date Time Letter Data
4/1/2024 7:25:08 AM A 0.7
4/1/2024 7:30:56 AM B 0.5
4/1/2024 8:32:51 PM A 0.6
4/1/2024 8:36:44 PM B 0.5
4/2/2024 1:32:59 PM A 1
4/2/2024 1:38:36 PM A 0.5
4/2/2024 8:46:11 PM A 0.7
4/2/2024 8:51:31 PM B 0.7
5/1/2024 1:35:12 PM A 0.6
5/1/2024 1:39:05 PM B 0.4
5/1/2024 6:07:11 PM A 0.8
5/1/2024 6:10:43 PM B 0.5
5/2/2024 8:42:16 AM B 0.1
5/2/2024 10:59:32 AM A 0.8
5/2/2024 6:15:07 PM A 0.4
5/2/2024 6:18:40 PM B 0.2

YTD (Desired output)

Date Time Letter Data
4/1/2024 7:25:08 AM A 0.7
4/1/2024 7:30:56 AM B 0.5
4/1/2024 8:32:51 PM A 0.6
4/1/2024 8:36:44 PM B 0.5
4/2/2024 1:32:59 PM A 1
4/2/2024 1:38:36 PM B 0.5
4/2/2024 8:46:11 PM A 0.7
4/2/2024 8:51:31 PM B 0.7
5/1/2024 1:35:12 PM A 0.6
5/1/2024 1:39:05 PM B 0.4
5/1/2024 6:07:11 PM A 0.8
5/1/2024 6:10:43 PM B 0.5
5/2/2024 10:59:32 AM A 0.8
5/2/2024 8:42:16 AM B 0.1
5/2/2024 6:15:07 PM A 0.4
5/2/2024 6:18:40 PM B 0.2

I used a simple power query as seen here and additionally, in order, changed all the data types to the correct type, sorted by date, sorted by time, removed source name, and removed duplicates.

The reason I cannot rely on file names to sort the data and keep it as it is within the file is because I am pulling data from sheets that all have the same name but are in their own respective monthly folders. Folders sort alphabetically so the order of the months would be wrong if I didn't manually sort it.

Share Improve this question edited Dec 13, 2024 at 20:41 davidebacci 30k4 gold badges17 silver badges47 bronze badges asked Dec 12, 2024 at 18:52 LightOfTheNightLightOfTheNight 255 bronze badges 7
  • Maybe take one slice of data and tell us what the output would look like after you adjust it for all potential errors, and the method you used to adjust each of those, so we can reproduce it – horseyride Commented Dec 12, 2024 at 19:09
  • @horseyride Thank you for the suggestion. I have simplified it as much as I think I can without losing the point – LightOfTheNight Commented Dec 12, 2024 at 20:20
  • Ok good luck with that – horseyride Commented Dec 12, 2024 at 20:30
  • How can you tell the difference between A and B versions where B was produced first, vs a labelling errors by the data collector? – Ron Rosenfeld Commented Dec 12, 2024 at 22:05
  • could you pls provide the expected output? there are three A in 4/2. How to order these three A? – Ryan Commented Dec 13, 2024 at 0:34
 |  Show 2 more comments

2 Answers 2

Reset to default 1

Try this:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Time", type number}, {"Letter", type text}, {"Data", type number}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1,  Int64.Type),
    Page = Table.TransformColumns( #"Added Index", {{"Index", each if Number.Mod(_,2) = 0 then _ else _-1}}),
    #"Grouped Rows" = Table.Group(Page, {"Index"}, {{"All", each _, type table [Date=nullable date, Time=nullable number, Letter=nullable text, Data=nullable number, Index=number]}}),
    Logic = Table.TransformColumns( #"Grouped Rows",{{"All", (x)=> let
        rec1 = x{0},
        rec2 = x{1},
        same = if rec1[Letter] = rec2[Letter] then true else false,
        logic1 = if same and rec1[Time] < rec2[Time] then  {Record.Combine({rec1, [Letter="A", Index=rec1[Index]+1]}),Record.Combine({rec2, [Letter="B", Index=rec2[Index]+2]})} else 
        if same and rec1[Time] > rec2[Time] then {Record.Combine({rec1, [Letter="B",Index=rec1[Index]+1]}),Record.Combine({rec2, [Letter="A", Index=rec2[Index]+2]})} else 
        if not same and rec1[Letter] = "A" then {Record.Combine({rec1, [Index=rec1[Index]+1]}),Record.Combine({rec2, [ Index=rec2[Index]+2]})} else 
        if not same and rec1[Letter] = "B" then {Record.Combine({rec1, [Index=rec1[Index]+2]}),Record.Combine({rec2, [ Index=rec2[Index]+1]})} else null
        in Table.FromRecords( logic1)
        }}),
    #"Removed Columns" = Table.RemoveColumns(Logic,{"Index"}),
    #"Expanded All" = Table.ExpandTableColumn(#"Removed Columns", "All", {"Date", "Time", "Letter", "Data", "Index"}, {"Date", "Time", "Letter", "Data", "Index"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Expanded All",{{"Date", type date}, {"Time", type number}, {"Letter", type text}, {"Data", type number}, {"Index", Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Index", Order.Ascending}})
in
    #"Sorted Rows"

This code seems to work with your data. Could be modified depending on your actual setup.

  • Assumes each month's table is read in as a separate query, and the data types are properly set.

  • Add a blank query

  • Paste the code below into the Advanced Editor

  • *Rename that query according to the Code Comments

//Rename: fnFixLetter

(tbl as table)=>

let 

//Sort by date and time, ascending
   #"Sorted Rows" = Table.Sort(tbl,{{"Date", Order.Ascending}, {"Time", Order.Ascending}}),

//ASSUMPTION: there are NO missing entries, so grouping by pairs after sorting by date and time will always return the relevant rows
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1, Int64.Type),
    #"Inserted Integer-Division" = Table.AddColumn(#"Added Index", "Integer-Division", each Number.IntegerDivide([Index], 2), Int64.Type),

//Remove unneeded Index Column
    #"Removed Columns" = Table.RemoveColumns(#"Inserted Integer-Division",{"Index"}),

//Group by Integer-Divide (the pairs)
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"Integer-Division"}, {
        
    //Do the magic
        {"all", (t)=> let 
            ltrs = t[Letter],

            //if the letters are identical then they should {"A","B"}
            order = if List.Count(List.Distinct(ltrs)) = 1 then {"A","B"} else ltrs,

            //Replace the Letters column with either {"A","B"} or leave what was there
            tbl= Table.FromColumns(
                    Table.ToColumns(
                        Table.RemoveColumns(t,{"Letter","Integer-Division"}))
                     & {order},{"Date","Time","Data","Letter"}),

            //resort each pair by Letter
            reSort = Table.Sort(tbl,{"Letter", Order.Ascending})
                    
            in reSort,
            type table[Date=nullable date, Time=nullable time,Data=nullable number, Letter=nullable text] 

        }}),

//Cleanup
    #"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Integer-Division"}),
    #"Expanded all" = Table.ExpandTableColumn(#"Removed Columns1", "all", {"Date", "Time", "Data", "Letter"}),
    #"Reorder Columns" = Table.ReorderColumns(#"Expanded all", {"Date","Time","Letter","Data"})
in 
        #"Reorder Columns"

Now, in a new query, combine all of the existing tables

let 

//List of all the tables to be combined, in order
    tbls = {April, May},

    append = List.Accumulate(
        tbls,
        #table({},{}),
        (s,c)=> Table.Combine({s,fnFixLetter(c)})
    )
in 
    append

Data Tables

Results

本文标签: