admin管理员组文章数量:1026396
I have an azure file share directory with files, folders and files within the folders FOLDER1 -- file1, file2, file3 ... FILE1 FILE2 FOLDER2 - file1, file2 ... FOLDER3 - file1, file2, file3 ....
I would like to develope a pipeline that counts all the files within the file share folder itself and files within the folders.
My solution so far: I used Get Metadata activity to list all the childitems, but I am stuck on what next to do. I am a beginner with azure.
I have an azure file share directory with files, folders and files within the folders FOLDER1 -- file1, file2, file3 ... FILE1 FILE2 FOLDER2 - file1, file2 ... FOLDER3 - file1, file2, file3 ....
I would like to develope a pipeline that counts all the files within the file share folder itself and files within the folders.
My solution so far: I used Get Metadata activity to list all the childitems, but I am stuck on what next to do. I am a beginner with azure.
Share Improve this question asked Nov 16, 2024 at 23:19 ShedrackShedrack 13 bronze badges 2- are the files exists only in the last subfolder or in any parent or intermediate folder as well? – Rakesh Govindula Commented Nov 17, 2024 at 3:30
- The files exist in any parent or intermediate folder as well – Shedrack Commented Nov 17, 2024 at 6:42
1 Answer
Reset to default 1How to count all the files that exist in folder and its sub-folders using ADF pipeline
To count files from the folder and its sub folder you need to use the combination of getmetadata , filter and for each loop activity as below:
- First use Get metadata activity to get child items from the file share directory.
- Then use filter activity to filter the files and folders using the output of get metadata
For folders - @equals(item().type,'Folder')
For files - @equals(item().type,'files')
- Then pass the output of filter activity where you filter the folders pass to the for each loop.
- The similarly get File numbers from subfolders in foreach loop as above and the store it in the append variable.
- Then take for each loop to Add the subfolders file count stored in append variable pass it to this for each loop:
- Then take two variables
subfilecount
andsubfilecounttemp
to add the subfolders files as below: - Then add this count and directory files count which we separated from first get metdata output:
My pipelie.json:
{
"name": "pipeline3",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "DelimitedText2",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Filter1",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata1').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@equals(item().type,'Folder')",
"type": "Expression"
}
}
},
{
"name": "Filter2",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata1').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@equals(item().type,'File')",
"type": "Expression"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Filter1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Filter1').output.value",
"type": "Expression"
},
"activities": [
{
"name": "Get Metadata2",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "DelimitedText3",
"type": "DatasetReference",
"parameters": {
"subfoldername": {
"value": "@item().name",
"type": "Expression"
}
}
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Filter3",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata2",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata2').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@equals(item().type,'Folder')",
"type": "Expression"
}
}
},
{
"name": "Filter4",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata2",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata2').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@equals(item().type,'File')",
"type": "Expression"
}
}
},
{
"name": "Append variable1",
"type": "AppendVariable",
"dependsOn": [
{
"activity": "Filter4",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "subfolder files",
"value": {
"value": "@activity('Filter4').output.FilteredItemsCount",
"type": "Expression"
}
}
}
]
}
},
{
"name": "Set variable3",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Filter2",
"dependencyConditions": [
"Succeeded"
]
},
{
"activity": "ForEach2",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "subfilecounttemp",
"value": {
"value": "@add(activity('Filter2').output.FilteredItemsCount,variables('subfilecount'))",
"type": "Expression"
}
}
},
{
"name": "ForEach2",
"type": "ForEach",
"dependsOn": [
{
"activity": "ForEach1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@variables('subfolder files')",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Set variable4",
"type": "SetVariable",
"dependsOn": [],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "subfilecount",
"value": {
"value": "@add(item(),variables('subfilecounttemp'))",
"type": "Expression"
}
}
},
{
"name": "Set variable5",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Set variable4",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "subfilecounttemp",
"value": {
"value": "@variables('subfilecount')",
"type": "Expression"
}
}
}
]
}
}
],
"variables": {
"subfolder": {
"type": "String"
},
"subfolder files": {
"type": "Array"
},
"subfilecounttemp": {
"type": "Integer"
},
"fies": {
"type": "Array"
},
"subfilecount": {
"type": "Integer"
}
},
"annotations": []
}
}
I have an azure file share directory with files, folders and files within the folders FOLDER1 -- file1, file2, file3 ... FILE1 FILE2 FOLDER2 - file1, file2 ... FOLDER3 - file1, file2, file3 ....
I would like to develope a pipeline that counts all the files within the file share folder itself and files within the folders.
My solution so far: I used Get Metadata activity to list all the childitems, but I am stuck on what next to do. I am a beginner with azure.
I have an azure file share directory with files, folders and files within the folders FOLDER1 -- file1, file2, file3 ... FILE1 FILE2 FOLDER2 - file1, file2 ... FOLDER3 - file1, file2, file3 ....
I would like to develope a pipeline that counts all the files within the file share folder itself and files within the folders.
My solution so far: I used Get Metadata activity to list all the childitems, but I am stuck on what next to do. I am a beginner with azure.
Share Improve this question asked Nov 16, 2024 at 23:19 ShedrackShedrack 13 bronze badges 2- are the files exists only in the last subfolder or in any parent or intermediate folder as well? – Rakesh Govindula Commented Nov 17, 2024 at 3:30
- The files exist in any parent or intermediate folder as well – Shedrack Commented Nov 17, 2024 at 6:42
1 Answer
Reset to default 1How to count all the files that exist in folder and its sub-folders using ADF pipeline
To count files from the folder and its sub folder you need to use the combination of getmetadata , filter and for each loop activity as below:
- First use Get metadata activity to get child items from the file share directory.
- Then use filter activity to filter the files and folders using the output of get metadata
For folders - @equals(item().type,'Folder')
For files - @equals(item().type,'files')
- Then pass the output of filter activity where you filter the folders pass to the for each loop.
- The similarly get File numbers from subfolders in foreach loop as above and the store it in the append variable.
- Then take for each loop to Add the subfolders file count stored in append variable pass it to this for each loop:
- Then take two variables
subfilecount
andsubfilecounttemp
to add the subfolders files as below: - Then add this count and directory files count which we separated from first get metdata output:
My pipelie.json:
{
"name": "pipeline3",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "DelimitedText2",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Filter1",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata1').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@equals(item().type,'Folder')",
"type": "Expression"
}
}
},
{
"name": "Filter2",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata1').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@equals(item().type,'File')",
"type": "Expression"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Filter1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Filter1').output.value",
"type": "Expression"
},
"activities": [
{
"name": "Get Metadata2",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "DelimitedText3",
"type": "DatasetReference",
"parameters": {
"subfoldername": {
"value": "@item().name",
"type": "Expression"
}
}
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Filter3",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata2",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata2').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@equals(item().type,'Folder')",
"type": "Expression"
}
}
},
{
"name": "Filter4",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata2",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata2').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@equals(item().type,'File')",
"type": "Expression"
}
}
},
{
"name": "Append variable1",
"type": "AppendVariable",
"dependsOn": [
{
"activity": "Filter4",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "subfolder files",
"value": {
"value": "@activity('Filter4').output.FilteredItemsCount",
"type": "Expression"
}
}
}
]
}
},
{
"name": "Set variable3",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Filter2",
"dependencyConditions": [
"Succeeded"
]
},
{
"activity": "ForEach2",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "subfilecounttemp",
"value": {
"value": "@add(activity('Filter2').output.FilteredItemsCount,variables('subfilecount'))",
"type": "Expression"
}
}
},
{
"name": "ForEach2",
"type": "ForEach",
"dependsOn": [
{
"activity": "ForEach1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@variables('subfolder files')",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Set variable4",
"type": "SetVariable",
"dependsOn": [],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "subfilecount",
"value": {
"value": "@add(item(),variables('subfilecounttemp'))",
"type": "Expression"
}
}
},
{
"name": "Set variable5",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Set variable4",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "subfilecounttemp",
"value": {
"value": "@variables('subfilecount')",
"type": "Expression"
}
}
}
]
}
}
],
"variables": {
"subfolder": {
"type": "String"
},
"subfolder files": {
"type": "Array"
},
"subfilecounttemp": {
"type": "Integer"
},
"fies": {
"type": "Array"
},
"subfilecount": {
"type": "Integer"
}
},
"annotations": []
}
}
本文标签:
版权声明:本文标题:azure data factory - How to count all the files that exist in folder and its sub-folders using ADF pipeline - Stack Overflow 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://it.en369.cn/questions/1745644648a2160950.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论