admin管理员组

文章数量:1023018

I have a data set where a customer may have had multiple assessments over a period of time. Now each customer's behavior towards these assessments may vary, meaning C1 may have had theirs on day 1, day 90, day 200, etc., and then C2 may have had their day 1, day 14, day 21, day 36, and so on.

My goal is to convert these individual rows into columns. Wonder if anyone had a similar requirement before. I am using Excel or Power Query or SQL to process this problem.

CustomerId Date Score 
C1 1/1/2020 9
C1 1/7/2020 14
C1 1/14/2020 26

C2 1/9/2020 34
C2 3/9/2020 30
C2 6/9/2020 24

Output should be in below format:

Customer | Initial Score | Avg_3_months | Avg_6_months | Avg_9_months | Avg_12_months and so on.

I have a data set where a customer may have had multiple assessments over a period of time. Now each customer's behavior towards these assessments may vary, meaning C1 may have had theirs on day 1, day 90, day 200, etc., and then C2 may have had their day 1, day 14, day 21, day 36, and so on.

My goal is to convert these individual rows into columns. Wonder if anyone had a similar requirement before. I am using Excel or Power Query or SQL to process this problem.

CustomerId Date Score 
C1 1/1/2020 9
C1 1/7/2020 14
C1 1/14/2020 26

C2 1/9/2020 34
C2 3/9/2020 30
C2 6/9/2020 24

Output should be in below format:

Customer | Initial Score | Avg_3_months | Avg_6_months | Avg_9_months | Avg_12_months and so on.

Share Improve this question edited Nov 19, 2024 at 6:56 samhita 4,1252 gold badges11 silver badges18 bronze badges asked Nov 18, 2024 at 20:54 Data123Data123 334 bronze badges 4
  • 1 Please edit your question to provide more background details on your problem. Telling readers about your research, what you have already tried, and why it didn’t meet your needs will allow them to understand it better. – Michal Commented Nov 18, 2024 at 21:43
  • 2 Provide a sample of input and output . If one customer shows day 14 and day 36, and another customer shows day 90 and day 200, then how do you want to show +3 months and +6 months. Whats the explicit calculation? – horseyride Commented Nov 18, 2024 at 21:48
  • use pivot, convert your lines in columns. – Julio Gadioli Soares Commented Nov 18, 2024 at 22:57
  • could you pls also provide your expected output in markdown table like the sample data? – Ryan Commented Nov 19, 2024 at 0:33
Add a comment  | 

2 Answers 2

Reset to default 0

The below uses SQL Server syntax. (https://dbfiddle.uk/8WMfYXO4)

WITH initial AS (
    SELECT CustomerId, MIN(Date) AS initial_date
    FROM data
    GROUP BY CustomerId
)
SELECT
    a.CustomerId,
    MAX(CASE WHEN a.Date = i.initial_date THEN a.Score END) AS Initial_Score,
    AVG(CASE WHEN a.Date > i.initial_date AND a.Date <= DATEADD(month, 3, i.initial_date) 
        THEN a.Score END) AS Avg_3m,
    AVG(CASE WHEN a.Date > DATEADD(month, 3, i.initial_date) AND a.Date <= DATEADD(month, 6, i.initial_date) 
        THEN a.Score END) AS Avg_6m,
    AVG(CASE WHEN a.Date > DATEADD(month, 6, i.initial_date) AND a.Date <= DATEADD(month, 9, i.initial_date) 
        THEN a.Score END) AS Avg_9m,
    AVG(CASE WHEN a.Date > DATEADD(month, 9, i.initial_date) AND a.Date <= DATEADD(month, 12, i.initial_date) 
        THEN a.Score END) AS Avg_12m
FROM data a
JOIN initial i ON a.CustomerId = i.CustomerId
GROUP BY a.CustomerId
ORDER BY a.CustomerId;

Output:

CustomerId Initial_Score Avg_3m Avg_6m Avg_9m Avg_12m
C1 9 20 null null null
C2 34 30 24 null null

How to correctly parse 1/14/2020 dates on all machines

That date is using the format string M/d/yyyy.

Sometimes setting culture alone can detect how to import. When you're choosing transform column types, include the optional culture.

To Be safe

You should set Format and Culture explicitly. That ensures parsing dates and numbers is always deterministic.

Date format strings are listed here: powerquery.how/Date.FromText

fun tip: There are combinations where even a safe "iso" format yyyy/MM/dd isn't actually distinct.

= Date.FromText( "1/14/2020", [ Format = "M/d/yyyy" , Culture = "en-us"] )

Stand Alone Example

let 
    ConvertRecord = ( source as text ) as record => [
        segments = Text.Split( sample, " "),
        parsedDate = Date.FromText( segments{1}, [ Format = "M/d/yyyy" , Culture = "en-us"] ),
        return = [ 
            Customer = segments{0},
            Date = parsedDate,
            Score = segments{2}
        ]
    ][return],
    
    sample = "C1 1/14/2020 9", 
    test = ConvertRecord( sample ),

    // just the 
    source = #table( 
        type table [ RawText = text ],
        {   {"C1 1/1/2020 9"}, 
            {"C1 1/7/2020 14"},
            {"C1 1/14/2020 26" }
        } ),

    // 
    convertRecords = Table.TransformColumns( source, {
        {"RawText", ConvertRecord, type record }} ),

    expandRecords = Table.ExpandRecordColumn( convertRecords, "RawText", {"Customer", "Date", "Score"}, {"Customer", "Date", "Score"})
in 
    expandRecords

Next step: Grouping

  • Use the Group By on the column CustomerId
  • Add another column and choose "All Rows"

Now you have a nested table partitioned per-user.( That means your calculation doesn't have to filter users )

Or that part might make more sense as a DAX measure. Then you can filter pre-aggregates.

I have a data set where a customer may have had multiple assessments over a period of time. Now each customer's behavior towards these assessments may vary, meaning C1 may have had theirs on day 1, day 90, day 200, etc., and then C2 may have had their day 1, day 14, day 21, day 36, and so on.

My goal is to convert these individual rows into columns. Wonder if anyone had a similar requirement before. I am using Excel or Power Query or SQL to process this problem.

CustomerId Date Score 
C1 1/1/2020 9
C1 1/7/2020 14
C1 1/14/2020 26

C2 1/9/2020 34
C2 3/9/2020 30
C2 6/9/2020 24

Output should be in below format:

Customer | Initial Score | Avg_3_months | Avg_6_months | Avg_9_months | Avg_12_months and so on.

I have a data set where a customer may have had multiple assessments over a period of time. Now each customer's behavior towards these assessments may vary, meaning C1 may have had theirs on day 1, day 90, day 200, etc., and then C2 may have had their day 1, day 14, day 21, day 36, and so on.

My goal is to convert these individual rows into columns. Wonder if anyone had a similar requirement before. I am using Excel or Power Query or SQL to process this problem.

CustomerId Date Score 
C1 1/1/2020 9
C1 1/7/2020 14
C1 1/14/2020 26

C2 1/9/2020 34
C2 3/9/2020 30
C2 6/9/2020 24

Output should be in below format:

Customer | Initial Score | Avg_3_months | Avg_6_months | Avg_9_months | Avg_12_months and so on.

Share Improve this question edited Nov 19, 2024 at 6:56 samhita 4,1252 gold badges11 silver badges18 bronze badges asked Nov 18, 2024 at 20:54 Data123Data123 334 bronze badges 4
  • 1 Please edit your question to provide more background details on your problem. Telling readers about your research, what you have already tried, and why it didn’t meet your needs will allow them to understand it better. – Michal Commented Nov 18, 2024 at 21:43
  • 2 Provide a sample of input and output . If one customer shows day 14 and day 36, and another customer shows day 90 and day 200, then how do you want to show +3 months and +6 months. Whats the explicit calculation? – horseyride Commented Nov 18, 2024 at 21:48
  • use pivot, convert your lines in columns. – Julio Gadioli Soares Commented Nov 18, 2024 at 22:57
  • could you pls also provide your expected output in markdown table like the sample data? – Ryan Commented Nov 19, 2024 at 0:33
Add a comment  | 

2 Answers 2

Reset to default 0

The below uses SQL Server syntax. (https://dbfiddle.uk/8WMfYXO4)

WITH initial AS (
    SELECT CustomerId, MIN(Date) AS initial_date
    FROM data
    GROUP BY CustomerId
)
SELECT
    a.CustomerId,
    MAX(CASE WHEN a.Date = i.initial_date THEN a.Score END) AS Initial_Score,
    AVG(CASE WHEN a.Date > i.initial_date AND a.Date <= DATEADD(month, 3, i.initial_date) 
        THEN a.Score END) AS Avg_3m,
    AVG(CASE WHEN a.Date > DATEADD(month, 3, i.initial_date) AND a.Date <= DATEADD(month, 6, i.initial_date) 
        THEN a.Score END) AS Avg_6m,
    AVG(CASE WHEN a.Date > DATEADD(month, 6, i.initial_date) AND a.Date <= DATEADD(month, 9, i.initial_date) 
        THEN a.Score END) AS Avg_9m,
    AVG(CASE WHEN a.Date > DATEADD(month, 9, i.initial_date) AND a.Date <= DATEADD(month, 12, i.initial_date) 
        THEN a.Score END) AS Avg_12m
FROM data a
JOIN initial i ON a.CustomerId = i.CustomerId
GROUP BY a.CustomerId
ORDER BY a.CustomerId;

Output:

CustomerId Initial_Score Avg_3m Avg_6m Avg_9m Avg_12m
C1 9 20 null null null
C2 34 30 24 null null

How to correctly parse 1/14/2020 dates on all machines

That date is using the format string M/d/yyyy.

Sometimes setting culture alone can detect how to import. When you're choosing transform column types, include the optional culture.

To Be safe

You should set Format and Culture explicitly. That ensures parsing dates and numbers is always deterministic.

Date format strings are listed here: powerquery.how/Date.FromText

fun tip: There are combinations where even a safe "iso" format yyyy/MM/dd isn't actually distinct.

= Date.FromText( "1/14/2020", [ Format = "M/d/yyyy" , Culture = "en-us"] )

Stand Alone Example

let 
    ConvertRecord = ( source as text ) as record => [
        segments = Text.Split( sample, " "),
        parsedDate = Date.FromText( segments{1}, [ Format = "M/d/yyyy" , Culture = "en-us"] ),
        return = [ 
            Customer = segments{0},
            Date = parsedDate,
            Score = segments{2}
        ]
    ][return],
    
    sample = "C1 1/14/2020 9", 
    test = ConvertRecord( sample ),

    // just the 
    source = #table( 
        type table [ RawText = text ],
        {   {"C1 1/1/2020 9"}, 
            {"C1 1/7/2020 14"},
            {"C1 1/14/2020 26" }
        } ),

    // 
    convertRecords = Table.TransformColumns( source, {
        {"RawText", ConvertRecord, type record }} ),

    expandRecords = Table.ExpandRecordColumn( convertRecords, "RawText", {"Customer", "Date", "Score"}, {"Customer", "Date", "Score"})
in 
    expandRecords

Next step: Grouping

  • Use the Group By on the column CustomerId
  • Add another column and choose "All Rows"

Now you have a nested table partitioned per-user.( That means your calculation doesn't have to filter users )

Or that part might make more sense as a DAX measure. Then you can filter pre-aggregates.

本文标签: