project management on Dat a Engineer

How to create Azure DevOps Pull Requests reporting with Power BI

Sun, 18 Aug 2024 00:00:00 +0000

As a developer, I have always emphasized the importance of code quality and efficient development processes. Modern Git workflows are typically about writing code, commits, pull requests, code reviews, and merges. To gain deeper insight into these processes, I decided to create a Power BI report to track them. My goal is to identify bottlenecks, areas for improvement, and opportunities to streamline our workflow.

Pre-requisites

Before we dive into building the Power BI report, you must have Power BI Desktop of course. It is necessary to have a Personal Access Token that has sufficient access to the project repositories. You will need it to authenticate the API calls from Power BI.

Parameters

To make the report work with different settings, we will use parameters. These parameters allow you to easily apply my code to your project. Just copy the code and edit the following parameters:

_organization: The Azure DevOps organization
_project: Your project. The report will retrieve pull requests from all repositories in the project.
_top: The number of most recent pull requests you want to analyze in the report.

Build the Power Query

Fetch data from Azure DevOps

Now that you have set up your Power BI report set up with parameters and have prepared the necessary credentials, it’s time to pull data from Azure DevOps. While Power BI has a built-in Azure DevOps connector, it only provides board data. To retrieve pull request information, we will need to access the Azure DevOps REST APIs .

See the following Power BI M query:

Source = Json.Document(Web.Contents("https://dev.azure.com/"&_organization&"/"&_project&"/_apis/git/pullrequests?searchCriteria.includeLinks=true&searchCriteria.status=all&$top="&_top&"&api-version=7.1-preview.1")),

The Web.Contents function will pull data from the REST API and return a binary. The Json.Document function will grab this binary and parse it to json. After this step, you should have source as a record which has two attributes:

value: a list of all pull request records.
count: the length of the value list.

Convert to Table

Our previous step resulted in a JSON record containing the pull request data. To make this data available for further analysis, we need to convert it to a table.

#"Converted to Table" = Table.FromRecords({Source}, {"value"}),

The above query convert value to a table in Power BI. The returned table has only one column and one row like below:

value
List

To make the table usable, we need to further transform it. First, we want to explode the list to rows

#"Expanded value" = Table.ExpandListColumn(#"Converted to Table", "value"),

And for each row, we want to expand the record to columns. Note that we don’t necessarily need all the columns. The M query below extracts only the columns we need.

#"Expanded value1" = Table.ExpandRecordColumn(
    #"Expanded value",
    "value",
    {"repository", "pullRequestId", "codeReviewId", "status", "createdBy", "creationDate", "closedDate", "title", "description", "sourceRefName", "targetRefName", "mergeStatus", "isDraft", "mergeId", "reviewers", "labels", "url", "completionOptions", "supportsIterations", "completionQueueTime"},
    {"value.repository", "value.pullRequestId", "value.codeReviewId", "value.status", "value.createdBy", "value.creationDate", "value.closedDate", "value.title", "value.description", "value.sourceRefName", "value.targetRefName", "value.mergeStatus", "value.isDraft", "value.mergeId", "value.reviewers", "value.labels", "value.url", "value.completionOptions", "value.supportsIterations", "value.completionQueueTime"}
),

Continue expanding columns

Even though the previous steps gave us a solid starting point, some columns still have nested records full of useful data. We will perform additional expansions to access this data.

/*
value.repository, value.createdBy, value.completionOptions are records, we can expand them into columns
*/
#"Expanded value.repository" = Table.ExpandRecordColumn(#"Expanded value1", "value.repository", {"name"}, {"value.repository.name"}),
#"Expanded value.createdBy" = Table.ExpandRecordColumn(#"Expanded value.repository", "value.createdBy", {"displayName", "id", "uniqueName"}, {"value.createdBy.displayName", "value.createdBy.id", "value.createdBy.uniqueName"}),
#"Expanded value.completionOptions" = Table.ExpandRecordColumn(#"Expanded value.createdBy", "value.completionOptions", {"mergeCommitMessage", "mergeStrategy", "transitionWorkItems"}, {"value.completionOptions.mergeCommitMessage", "value.completionOptions.mergeStrategy", "value.completionOptions.transitionWorkItems"}),
/*
value.reviewers is otherwise a list of records. For each list, we will concat all displayName of each record
*/
#"Expanded value.reviewers" = Table.TransformColumns(#"Expanded value.completionOptions", {{"value.reviewers", each Combiner.CombineTextByDelimiter(", ")(List.Transform(, each [displayName]))}}),

Add details from other APIs

While the pull request endpoint provides us with a lot of useful information, it might not be enough. We often need to supplement our data with information from other Azure DevOps APIs to gain deeper insights. The process is pretty similar with what we have done so far: pulling data from API and expanding JSON objects.

Iterations

Iterations are created as a result of creating and pushing updates to a pull request. The number of iterations is equal to the number of updates made after pull requests are created. Below is the Power BI M query to get the number of iterations for each pull request:

#"Added iterations" = Table.AddColumn(#"Expanded value.reviewers", "iterations", each Json.Document(Web.Contents([value.url]&"/iterations/"))),
#"Expanded iterations" = Table.ExpandRecordColumn(#"Added iterations", "iterations", {"count"}, {"iterations.count"}),

Changes

Another good metric to track is the number of files changed in each pull request. And we need to have the changes in all iterations, not just the initial pull request. Below is the code to retrieve the data from the API and extract the required information.

#"Added iterations.changes" = Table.AddColumn(#"Expanded iterations", "iterations.changes", each Json.Document(Web.Contents([value.url]&"/iterations/"&Number.ToText([iterations.count])&"/changes?api-version=7.1-preview.1"))),
#"Expanded iterations.changes" = Table.ExpandRecordColumn(#"Added iterations.changes", "iterations.changes", {"changeEntries"}, {"iterations.changes.changeEntries"}),
#"Added iterations.changes.changeEntries.count" = Table.AddColumn(#"Expanded iterations.changes", "iterations.changes.changeEntries.count", each List.Count([iterations.changes.changeEntries])),
#"Removed iterations.changes.changeEntries" = Table.RemoveColumns(#"Added iterations.changes.changeEntries.count",{"iterations.changes.changeEntries"}),

Threads

Threads are an Azure DevOps object for managing and organizing pull request discussions. Team can discuss specific changes directly by adding one or more comments to each thread. Analyzing threads can give us many useful insights.

#"Added threads" = Table.AddColumn(#"Removed iterations.changes.changeEntries", "threads", each Json.Document(Web.Contents([value.url]&"/threads?api-version=7.1-preview.1"))),
#"Expanded threads" = Table.ExpandRecordColumn(#"Added threads", "threads", {"value"}, {"threads.value"}),

For example, we can count the comment threads. A comment thread should have the status attribute (Active, Resolved, Closed)

#"Added threads.value.commentCount" = Table.AddColumn(#"Expanded threads", "threads.value.commentCount", each List.Sum(List.Transform([threads.value], each Number.From(Record.HasFields(_, "status"))))),

Or we can get the approval or rejection information from the vote thread. A vote thread has a CodeReviewThreadType attribute with value VoteUpdate. And if the value of CodeReviewVoteResult is greater than 0, it is an approval. Otherwise, it is a rejection. The below M query get the fist approval time of a pull request.

#"Added threads.value.firstApprovalTime" = Table.AddColumn(
    #"Added threads.value.commentCount",
    "threads.value.firstApprovalTime",
    each List.Min(
        List.Transform(
            [threads.value],
            each if
                Record.HasFields(_[properties], "CodeReviewThreadType") and Record.Field(_[properties][CodeReviewThreadType], "$value") = "VoteUpdate"
                and Record.HasFields(_[properties], "CodeReviewVoteResult") and Number.FromText(Record.Field(_[properties][CodeReviewVoteResult], "$value")) > 0
            then _[publishedDate]
            else null
        )
    )
),

Full source code

You can grab the source code, paste it into the Power BI Power Query advanced editor, and customize it to suit your needs.

Full Query

Visualize insights

Now we have a rich dataset. Power BI offers a wide range of visual elements to help you uncover trends, patterns, and insights. It’s time to bring our data to life with stunning visualizations.

Conclusion

Remember, this is just the beginning. As your project evolves and your data grows, you can expand your report to include additional metrics, refine visualizations, and explore new insights. Continuous improvement is essential to maximizing the value of your data.

By creating a comprehensive pull request report, you are taking the initial step toward establishing a culture of data-driven decision-making, first within your development team, then throughout your organization.

How to start a successful Data Warehouse project

Sun, 11 Aug 2024 00:00:00 +0000

Any organization aiming to leverage the power of data-driven decision-making stands to benefit greatly from a successful Data Warehouse project. A well-designed Data Warehouse not only centralizes your data but also guarantees that it is reliable, scalable, maintainable, and usable by stakeholders.

Over the past few months, my team and I have launched a new Data Warehouse project in production. The opportunity to start from scratch is always a valuable chance to gain new insights and expertise. I would like to share the experiences from this success story in the hope that they will be as beneficial to others as they have been to us.

Understand Business Requirements

The first step in starting any project, not only a Data Warehouse, is to fully understand the business requirements. This is the difference between success and failure, not just a formality. If you skip this step, I can tell you with certainty that your project will be a waste of time, energy, and resources.

To really understand what business wants to see and what your team needs to do, it’s essential to spend time talking to the people who will be using the Data Warehouse. What do they hope to accomplish? How will it help them do their jobs better? How do they plan to use the data? Getting a clear picture of their goals is essential to making sure your project is on the right track.

However, this is where things often get complicated. People usually do not understand each other, especially people in different departments who have different perspectives, priorities, and terminologies. Sometimes people do not even understand what they are saying. Business guys are the ones who are easily attracted to marketing buzzwords on the Internet believing that these terms are the solutions to their problems. I have to say that the marketing departments of data companies do a really good job of re-inventing new names for the similar term. During this project, there were dozens of times the guys told me let’s use this tool, why not use this technology, money is not a problem (until they actually got the bill).

In one of my previous projects, a stakeholder told me that he wanted a visually stunning real-time dashboard that would make the numbers dance instantly whenever users did something in the web application. And I had to explain to him:

Visually stunning: Yes, the data analysts team can always help you with that.
Real-time: There is no real time. If the sun disappears, we can know it only after 8 minutes. So does the data.
We do not really need it. Business is not going to sit still and watch the numbers dance every second.

Patience is the key. They do not understand those technical buzzwords. Yes. But isn’t that why you are here as a technical specialist? Your responsibility is to listen to them, understand them, empathize with them, and tell them what you will do to help them. Your job is to translate their requirements into a workable solution.

Remember that the business stakeholders are not only the end users, but also the investors. Without their buy-in, the project can’t even get off the ground. They are funding the project, and they deserve the best service.

By starting with a clear understanding of business requirements, you set the stage for a Data Warehouse project that is aligned with the organization’s goals, ensuring that the final product delivers real value.

Understand System

A Data Warehouse is not an isolated island. It is more like a bustling city that relies on a network of interconnected systems. It receives supplies from surrounding farms and industrial areas. Since Data Warehouse pulls data from other systems, you can not build a successful Data Warehouse without understanding how the other systems work.

Imagine stakeholders telling you they want the sales figure. Then you need to know exactly which systems hold the sales number. How is that number populated in each system? It may be manually entered by users, it may be automatically calculated, it may be synchronized from other sources, it may be read-only or editable… You need to know all the surrounding information to decide the source of truth for the number we desire. You may argue that all you need to do is copy the source database over and the business will know what to do with the data. Believe me, they don’t. In fact, they have never seen the database a day in their lives. And you are the one who will tell them what they can do with your Data Warehouse.

And not knowing how the system works also risks your project design. You certainly don’t want to discover a surprise when you’re almost done with the implementation, such as a scheduled job that archives data from the database daily. If you had known that from the beginning, your design would have been very different.

Understanding the entire system in detail can be time-consuming. You should have a good sense of how the interconnected systems work together, but don’t expect to understand them in detail at the beginning of your project. Instead, I would suggest building strong relationships with the teams responsible for maintaining these systems. Meet with them, tell them what you are doing, and ask for their advice and insights. They are a goldmine of information. You can also experiment with sandbox environments and databases to uncover hidden patterns and processes.

Design a reliable Data Warehouse

Reliability is the backbone of any Data Warehouse. If your business can’t rely on the data coming out of your Data Warehouse, your project is completely a failure.

Having a solid testing strategy will greatly help. Testing is not just about finding bugs, it’s about building confidence. When you start designing the data warehouse, think less about the time when the system is running happily, there is nothing for us to do if the system keeps running as it should. Think more about the time when the system is not working and what we are going to do in that time.

And even if you do your best, bugs and issues will still happen. Don’t expect your system to be bug-free; instead, build processes to handle issues as soon as they arise. And most importantly, be transparent. If the business comes to you and asks about an issue they found, tell them what happened and what you are doing to help. Transparency is the key to trust. If you tell a lie, you are part of the problem; if you are transparent, you are part of the solution. A reliable Data Warehouse isn’t just about technology. It’s about building trust.

Choose the right tool for the right job

To build a Data Warehouse, you need a toolbox filled with different pieces to complete the picture: tools for copying data, transforming it, orchestrating jobs, and more. It is technically possible to create the tools yourself, especially if you are in a big corporation and want to control every aspect of the technology. However, in most of the cases, it is impractical. You do not have enough resources to own the technology. Thus, developing a Data Warehouse solution usually means picking the available tools and services and making them work together.

The real challenge is choosing the right tools. Beware of your enemies, the shiny marketing promises. The person who writes those buzzwords may not be the one who writes the code. Sometimes I don’t understand what they wrote, and I think they don’t understand what they wrote either. These tools are very expensive. It is important to avoid overkill. Focus on what your business really needs, not just what sounds cool. We are not going to use the most popular or the most expensive tools; we are going to find the right fit for our specific needs.

Start small, Grow big

Your investors do not have infinite patience. They want to see progress and value. Building something small but functional is far better than promising a grand project that never finish. By starting small, you can quickly deliver value and gather feedback from users.

With limited resources, we can not get everything done at once. It is important to prioritize. What matters most to your business? What will deliver the biggest impact to your customers? Concentrate on delivering those core features first. You can break the project into phases, which is a good practice. Each phase focuses on specific business requirements, data sources, or user groups. And you can gradually expand the capabilities of the Data Warehouse.

Engage users

A Data Warehouse is not just a technical marvel. It is a tool for your business. To ensure it delivers maximum value, you need to involve your users from the very beginning.

Imagine building a house without consulting the people who will live in it. People can still live in it, but they never feel it is their home. By involving them early and often, you will gain valuable insight into their needs, expectations, and challenges.

How can you engage your users?

Involve them in the planning phase: Understand their data needs, pain points, and desired outcomes.
Provide regular updates: Keep them informed about project progress and involve them in decision-making.
Offer training and support: Equip users with the skills to effectively use the Data Warehouse.
Gather feedback: Encourage users to share their thoughts and suggestions for improvement.

Remember that if you can not engage your users, any slightly higher number in their reports will quickly become your problem. If you can engage them and make them feel like they are part of the project, then any issue will become everyone’s problem.

Conclusion

Building a successful Data Warehouse is a challenging journey that requires careful planning, execution, and continuous improvement. It all starts with a deep understanding of the business requirements to ensure that every decision is aligned with the organization’s goals. Start small, iterate often, and always keep the user at the center of your efforts. A successful Data Warehouse is a collaboration between the engineering team and the business. By working together, you can create a solution that truly delivers value.