MongoDB의 문서 재 형성

(광산은 보통처럼)이 질문은, SO와 같은 묻는 질문을 정독하고 자신을 위해 또 다른 질문을 제기 나옵니다. 그래서 떨어져 문제에 대한 해결책을 향해 노력의 학습 운동에서, 나는 또 다른 문제는 이와 같은, 팝업 것을 찾을 수 있습니다.

아직 같은 원래의 질문은 영업 이익으로 용인되지 남아 있고, 실제로 "그들이"달성하기 위해 원하는 것을에 관해서는 명확하지 않았습니다. 하지만 간단하고 해결책에 도착 긴 형태 모두에서, 내 해석을주지 않았다.

이 과정은, 결국, 솔루션의 긴 형식을 고려, 도입 된 추가 집계 연산자를 사용하여, 다음 (현재 예상 2.6) MongoDB를 릴리스에 도입되는 몇 가지 새로운 기능이있을 것이라고 생각해 내게 남아있다.

다음과 같이 그래서 경우는 다음과 같습니다

{
    "tracked_item_type" : "Software",
    "tracked_item_name" : "Word",
    "duration" : 9540
}
{
    "tracked_item_type" : "Software",
    "tracked_item_name" : "Excel",
    "duration" : 4000
}
{
    "tracked_item_type" : "Software",
    "tracked_item_name" : "Notepad",
    "duration" : 4000
}
{
    "tracked_item_type" : "Site",
    "tracked_item_name" : "Facebook",
    "duration" : 7920
}
{
    "tracked_item_type" : "Site",
    "tracked_item_name" : "Twitter",
    "duration" : 5555
}
{
    "tracked_item_type" : "Site",
    "tracked_item_name" : "Digital Blasphemy",
    "duration" : 8000
}

총 지속 시간으로 정렬 각 유형에 의해 상위 두 결과. 이 작은 샘플에도 불구하고, 지속 시간은 많은 항목 달러 (A $)의 합으로 간주됩니다.

{ 
    "tracked_item_type": "Site",
    "tracked_item_name": "Digital Blasphemy",
    "duration" : 8000
}
{ 
    "tracked_item_type": "Site",
    "tracked_item_name": "Facebook",
    "duration" : 7920
}
{ 
    "tracked_item_type": "Software",
    "tracked_item_name": "Word",
    "duration" : 9540
}
{ 
    "tracked_item_type": "Software",
    "tracked_item_name": "Notepad",
    "duration" : 4000
}

이 문제를 해결하기 나의 긴 방법이었다

db.collection.aggregate([

    // Group on the types and "sum" of duration
    {"$group": {
        "_id": {
            "tracked_item_type": "$tracked_item_type",
            "tracked_item_name": "$tracked_item_name"
         },
        "duration": {"$sum": "$duration"}
    }},

    // Sort by type and duration descending
    {"$sort": { "_id.tracked_item_type": 1, "duration": -1 }},

    /* The fun part */

    // Re-shape results to "sites" and "software" arrays 
    {"$group": { 
        "_id": null,
        "sites": {"$push":
            {"$cond": [
                {"$eq": ["$_id.tracked_item_type", "Site" ]},
                { "_id": "$_id", "duration": "$duration" },
                null
            ]}
        },
        "software": {"$push":
            {"$cond": [
                {"$eq": ["$_id.tracked_item_type", "Software" ]},
                { "_id": "$_id", "duration": "$duration" },
                null
            ]}
        }
    }},


    // Remove the null values for "software"
    {"$unwind": "$software"},
    {"$match": { "software": {"$ne": null} }},
    {"$group": { 
        "_id": "$_id",
        "software": {"$push": "$software"}, 
        "sites": {"$first": "$sites"} 
    }},

    // Remove the null values for "sites"
    {"$unwind": "$sites"},
    {"$match": { "sites": {"$ne": null} }},
    {"$group": { 
        "_id": "$_id",
        "software": {"$first": "$software"},
        "sites": {"$push": "$sites"} 
    }},


    // Project out software and limit to the *top* 2 results
    {"$unwind": "$software"},
    {"$project": { 
        "_id": 0,
        "_id": { "_id": "$software._id", "duration": "$software.duration" },
        "sites": "$sites"
    }},
    {"$limit" : 2},


    // Project sites, grouping multiple software per key, requires a sort
    // then limit the *top* 2 results
    {"$unwind": "$sites"},
    {"$group": {
        "_id": { "_id": "$sites._id", "duration": "$sites.duration" },
        "software": {"$push": "$_id" }
    }},
    {"$sort": { "_id.duration": -1 }},
    {"$limit": 2}

])

그리고 통합이 최종 결과에 점점 미달 점. 나의 현재를 이해하는 데 적어도.

{
    "result" : [
        {
            "_id" : {
                "_id" : {
                    "tracked_item_type" : "Site",
                    "tracked_item_name" : "Digital Blasphemy"
                 },
                 "duration" : 8000
           },
            "software" : [
                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Word"
                    },
                    "duration" : 9540
                },

                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Notepad"
                    },
                    "duration" : 4000
                }
            ]
        },
        {
            "_id" : {
                "_id" : {
                    "tracked_item_type" : "Site",
                    "tracked_item_name" : "Facebook"
                },
                "duration" : 7920
            },
            "software" : [
                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Word"
                    },
                    "duration" : 9540
                },
                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Notepad"
                    },
                    "duration" : 4000
                }
            ]
        }
    ],
    "ok" : 1
}

이 모든 결과가 동안 완료되지는 후 처리 원하는 형태로 마사지하기 위해 코드에있을 수 (어쨌든 나에게) 매우 합리적인 것 같았다.

그러나 실제로는 운동, 이것은 통합에 대한 향후 기능의 사용으로 달성 할 수 있는지에 관한 음모의 점을 보인다 (또는 아마도 저를 회피 한 다른 기술) 원하는 결과 양식으로 얻을 수 있습니다.

그래서이 달성 할 수있는 방법에 대한 어떠한 제안 / 포인터로 답변을 주시기 바랍니다.

해결법

==============================

1.다음은 각 부문에서 기간으로 상단이 발견 집계는 (은 샘플 출력 라인에있는 것으로 보인다 임의로 중단 "관계"를 않습니다)

다음은 각 부문에서 기간으로 상단이 발견 집계는 (은 샘플 출력 라인에있는 것으로 보인다 임의로 중단 "관계"를 않습니다)

var pregroup = { "$group" : {
        "_id" : {
            "type" : "$tracked_item_type",
            "name" : "$tracked_item_name"
        },
        "duration" : {
            "$sum" : "$duration"
        }
    }
};
var sort = { "$sort" : { "_id.type" : 1, "duration" : -1 } };
var group1 = { "$group" : {
        "_id" : "$_id.type",
        "num1" : {
            "$first" : {
                "name" : "$_id.name",
                "dur" : "$duration"
            }
        },
        "other" : {
            "$push" : {
                "name" : "$_id.name",
                "dur" : "$duration"
            }
        },
    "all" : {
        "$push" : {
            "name" : "$_id.name",
            "dur" : "$duration"
        }
    }
    }
};
var unwind = { "$unwind" : "$other" };
project = {
    "$project" : {
        "keep" : {
            "$ne" : [
                "$num1.name",
                "$other.name"
            ]
        },
        "num1" : 1,
        "all" : 1,
        "other" : 1
    }
};
var match = { "$match" : { "keep" : true } };
var sort2 = { "$sort" : { "_id" : 1, "other.dur" : -1 } };
var group2 = { "$group" : {
        "_id" : "$_id",
        "numberOne" : {
            "$first" : "$num1"
        },
        "numberTwo" : {
            "$first" : "$other"
        },
    "all" : {
        "$first" : "$all"
    }
    }
};
unwind2 = { "$unwind" : "$all" };
project2 = { "$project" : {
    "_id" : 0,
    "tracked_item_type" : "$_id",
    "tracked_item_name" : {
        "$cond" : [
            {
                "$or" : [
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberOne.name"
                        ]
                    },
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberTwo.name"
                        ]
                    }
                ]
            },
            "$all.name",
            null
        ]
    },
    "duration" : {
        "$cond" : [
            {
                "$or" : [
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberOne.name"
                        ]
                    },
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberTwo.name"
                        ]
                    }
                ]
            },
            "$all.dur",
            null
        ]
    }
}
}
match2 = { "$match" : { "tracked_item_name" : { "$ne" : null } } };

샘플 데이터이 실행 :

db.top2.aggregate(pregroup, sort, group1, unwind, project, match, sort2, group2, unwind2, project2, match2).toArray()
[
    {
        "tracked_item_type" : "Software",
        "tracked_item_name" : "Word",
        "duration" : 9540
    },
    {
        "tracked_item_type" : "Software",
        "tracked_item_name" : "Notepad",
        "duration" : 4000
    },
    {
        "tracked_item_type" : "Site",
        "tracked_item_name" : "Digital Blasphemy",
        "duration" : 8000
    },
    {
        "tracked_item_type" : "Site",
        "tracked_item_name" : "Facebook",
        "duration" : 7920
    }
]

이 도메인의 임의의 수 (다른 추적 된 항목 유형 값)와 함께 작동합니다 그리고 당신은 사전에 모두 이름을 알 필요가 없습니다. 아주 실용적 또는 꽤하지 -하지만, 각각의 추가 상단 "N"값을 네 가지 이상의 단계를 추가 할 것 등 세, 4 위, 5 위, 맨 위로를 일반화한다.

집계 프레임 워크 "상위 N"기능보다 기본 구현을 얻기 위해이 JIRA 티켓을 투표하세요.

==============================

2.나는 그것을 기대하지 않은,하지만 멋진 새로운 운영자를 포함 2.6의 구현에서 찾을 수 있습니다 답이있다.

나는 그것을 기대하지 않은,하지만 멋진 새로운 운영자를 포함 2.6의 구현에서 찾을 수 있습니다 답이있다.

문제가 너무 너무 이러한 항목을 병합하는 방법, 필요 하나가 될 것을 두 개의리스트를 가지고 내려 온 것이 무엇인지 나는 (결국)에 대해 생각 그래서 하나 개의 필드에서 그들은 모두 가을이. 그래서 이것에 대한 명백한 운영자, $ setUnion이있다.

여기에 내가 부분에서 설명 할 것이다 첫번째 새로운 조각은,이다 :

// So this part just "normalizes" a little so we get one record that essentially has
// two arrays in it
{"$group": { 
    _id: { _id: null, software: "$software"  },
    sites: {$push:"$_id" }
}},

그리고 결과 문서 :

{
    "_id" : {
        "_id" : null,
        "software" : [
             {
                 "_id" : {
                     "tracked_item_type" : "Software",
                     "tracked_item_name" : "Word"
                 },
             "duration" : 9540
             },
             {
                 "_id" : {
                     "tracked_item_type" : "Software",
                     "tracked_item_name" : "Notepad"
             },
             "duration" : 4000
         }
        ]
    },
    "sites" : [
        {
            "_id" : {
                "tracked_item_type" : "Site",
                "tracked_item_name" : "Digital Blasphemy"
            },
            "duration" : 8000
        },
        {
            "_id" : {
                "tracked_item_type" : "Site",
                "tracked_item_name" : "Facebook"
            },
            "duration" : 7920
        }
    ]
}

틀림없이 내가 전에 왼쪽 곳보다 결과의 더 나은 형태입니다 문서, 그래서, 항목이 더 이상 중복되지 고려, 기본적으로 우리가 하나로 병합 할 두 개의리스트가 있습니다. 지금 거기에 모든 그래서이 병합을 용이하게 연산자를 사용하는 것입니다 :

// Then we just project with a new field, and the "$setUnion" of the two arrays
{"$project": { 
    "_id": 0,
    "records": {"$setUnion": ["$_id.software", "$sites"]} 
}},

그리고 여기에 우리를 제공합니다 :

{
    "records" : [
        {
            "_id" : {
                "tracked_item_type" : "Site",
                "tracked_item_name" : "Facebook"
            },
            "duration" : 7920
        },
        {
            "_id" : {
                "tracked_item_type" : "Software",
                "tracked_item_name" : "Word"
            },
            "duration" : 9540
        },
        {
            "_id" : {
                "tracked_item_type" : "Site",
                "tracked_item_name" : "Digital Blasphemy"
            },
            "duration" : 8000
        },
        {
            "_id" : {
                "tracked_item_type" : "Software",
                "tracked_item_name" : "Notepad"
            },
            "duration" : 4000
        }
    ]
}

그리고 기본적으로 그게 다입니다. 이제 우리는 너무 조금 "풀기", 프로젝션 및 정렬로, 네 개의 항목을 가지고, 우리는 내가 찾던 그 정확한 결과를 얻을 수 있습니다.

그래서 여기에 바로 레코드에 대한 모든 것은,이다 :

db.collection.aggregate([

    // Group on the types and "sum" of duration
    {"$group": {
        "_id": {
            "tracked_item_type": "$tracked_item_type",
            "tracked_item_name": "$tracked_item_name"
         },
        "duration": {"$sum": "$duration"}
    }},

    // Sort by type and duration descending
    {"$sort": { "_id.tracked_item_type": 1, "duration": -1 }},

    /* The fun part */

    // Re-shape results to "sites" and "software" arrays 
    {"$group": { 
        "_id": null,
        "sites": {"$push":
            {"$cond": [
                {"$eq": ["$_id.tracked_item_type", "Site" ]},
                { "_id": "$_id", "duration": "$duration" },
                null
            ]}
        },
        "software": {"$push":
            {"$cond": [
                {"$eq": ["$_id.tracked_item_type", "Software" ]},
                { "_id": "$_id", "duration": "$duration" },
                null
            ]}
        }
    }},


    // Remove the null values for "software"
    {"$unwind": "$software"},
    {"$match": { "software": {"$ne": null} }},
    {"$group": { 
        "_id": "$_id",
        "software": {"$push": "$software"}, 
        "sites": {"$first": "$sites"} 
    }},

    // Remove the null values for "sites"
    {"$unwind": "$sites"},
    {"$match": { "sites": {"$ne": null} }},
    {"$group": { 
        "_id": "$_id",
        "software": {"$first": "$software"},
        "sites": {"$push": "$sites"} 
    }},


    // Project out software and limit to the *top* 2 results
    {"$unwind": "$software"},
    {"$project": { 
        "_id": 0,
        "_id": { "_id": "$software._id", "duration": "$software.duration" },
        "sites": "$sites"
    }},
    {"$limit" : 2},


    // Project sites, grouping multiple software per key, requires a sort
    // then limit the *top* 2 results
    {"$unwind": "$sites"},
    {"$group": {
        "_id": { "_id": "$sites._id", "duration": "$sites.duration" },
        "software": {"$push": "$_id" }
    }},
    {"$sort": { "_id.duration": -1 }},
    {"$limit": 2},

    // So this part just "normalizes" a little so we get one record that
    // essentially has two arrays in it
    {"$group": { 
        _id: { _id: null, software: "$software"  },
        sites: {$push:"$_id" }
    }},

    // Then we just project with a new field, and the "$setUnion" of the two arrays
    {"$project": { 
        "_id": 0,
       "records": {"$setUnion": ["$_id.software", "$sites"]} 
    }},

    // Unwind the array to documents
    {"$unwind": "$records"},

    // Shape the final output
    {"$project": { 
        "tracked_item_type": "$records._id.tracked_item_type",
        "tracked_item_name": "$records._id.tracked_item_name",
        "duration": "$records.duration"
    }},

     // Final sort on the result
    {"$sort": { "tracked_item_type": 1, "duration": -1 }} 

])

일반적인 전제가 $ 수있는 결국 $를 호출하여 상위 결과는 다음 꺼내 될 수 있도록 자신의 배열로 모든 문서를 밀어에서 파생로 분명히 드롭 오프 포인트 전체 접근 방식은 비실용적,이 그 결과를 제한 할 수 있습니다.

각 "카테고리"에 대한 결과 많은 수있을 것입니다 경우, 따라서, 가능성이 후 개별적으로 "카테고리"를 처리하는보다 실용적인 접근 방법이며, 단순히 정상에 필요한 두 항목을 그 결과 각각의 제한 .

그러나 연습으로, 적어도 나는이 작업을 수행 할 수 있다는 것을 이제 알고있다. 이 모든 사람에게 유용 바랍니다.

나는 오전 아직도 사람이하지만 다른 방법이 있는지 관심.

from https://stackoverflow.com/questions/21949521/mongodb-document-re-shaping by cc-by-sa and MIT license

'MONGODB' 카테고리의 다른 글

[MONGODB] 이 기준에 MongoDB의에서 ~ 20,000 레코드를 업데이트하는 방법 (0)	2019.12.27
[MONGODB] 이미지 REST API를 항상 표시 부서에서 반환 (0)	2019.12.27
[MONGODB] MongoDB의 볼록 JS에서의 데이터 테이블의 1000 개 이상의 연속보기? (0)	2019.12.27
[MONGODB] MongoDB의 C # 드라이버 복귀 만 배열 하위 문서를 일치 (0)	2019.12.27
[MONGODB] 한 간행물은 다른 간행물에서 중첩 된 필드를 숨기고 (0)	2019.12.27

복붙노트

[MONGODB] MongoDB의 문서 재 형성

MongoDB의 문서 재 형성

해결법

1.다음은 각 부문에서 기간으로 상단이 발견 집계는 (은 샘플 출력 라인에있는 것으로 보인다 임의로 중단 "관계"를 않습니다)

2.나는 그것을 기대하지 않은,하지만 멋진 새로운 운영자를 포함 2.6의 구현에서 찾을 수 있습니다 답이있다.

'MONGODB' 카테고리의 다른 글

티스토리툴바