<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Project Management on Dat a Engineer</title><link>https://note.datengineer.dev/tags/project-management/</link><description>Recent content in Project Management on Dat a Engineer</description><image><title>Dat a Engineer</title><url>https://note.datengineer.dev/images/cover.png</url><link>https://note.datengineer.dev/images/cover.png</link></image><generator>Hugo -- 0.147.5</generator><language>en-us</language><lastBuildDate>Sun, 18 Aug 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://note.datengineer.dev/tags/project-management/index.xml" rel="self" type="application/rss+xml"/><item><title>How to create Azure DevOps Pull Requests reporting with Power BI</title><link>https://note.datengineer.dev/posts/how-to-create-azure-devops-pull-requests-reporting-with-power-bi/</link><pubDate>Sun, 18 Aug 2024 00:00:00 +0000</pubDate><guid>https://note.datengineer.dev/posts/how-to-create-azure-devops-pull-requests-reporting-with-power-bi/</guid><description>Gain insights from your Azure DevOps data with this step-by-step guide to building a comprehensive pull request report using Power BI.</description><content:encoded><![CDATA[<p>As a developer, I have always emphasized the importance of code quality and efficient development processes. Modern Git workflows are typically about writing code, commits, pull requests, code reviews, and merges. To gain deeper insight into these processes, I decided to create a Power BI report to track them. My goal is to identify bottlenecks, areas for improvement, and opportunities to streamline our workflow.</p>
<h2 id="pre-requisites">Pre-requisites</h2>
<p>Before we dive into building the Power BI report, you must have Power BI Desktop of course. It is necessary to have a Personal Access Token that has sufficient access to the project repositories. You will need it to authenticate the API calls from Power BI.</p>
<h2 id="parameters">Parameters</h2>
<p>To make the report work with different settings, we will use parameters. These parameters allow you to easily apply my code to your project. Just copy the code and edit the following parameters:</p>
<ul>
<li><code>_organization</code>: The Azure DevOps organization</li>
<li><code>_project</code>: Your project. The report will retrieve pull requests from all repositories in the project.</li>
<li><code>_top</code>: The number of most recent pull requests you want to analyze in the report.</li>
</ul>
<h2 id="build-the-power-query">Build the Power Query</h2>
<h3 id="fetch-data-from-azure-devops">Fetch data from Azure DevOps</h3>
<p>Now that you have set up your Power BI report set up with parameters and have prepared the necessary credentials, it&rsquo;s time to pull data from Azure DevOps. While Power BI has a built-in Azure DevOps connector, it only provides board data. To retrieve pull request information, we will need to access the <a href="https://learn.microsoft.com/en-us/rest/api/azure/devops/git/pull-requests?view=azure-devops-rest-7.1" rel="nofollow">Azure DevOps REST APIs</a>
.</p>
<p>See the following Power BI M query:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">Source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Json</span><span class="p">.</span><span class="n">Document</span><span class="p">(</span><span class="n">Web</span><span class="p">.</span><span class="n">Contents</span><span class="p">(</span><span class="s2">&#34;https://dev.azure.com/&#34;</span><span class="o">&amp;</span><span class="n">_organization</span><span class="o">&amp;</span><span class="s2">&#34;/&#34;</span><span class="o">&amp;</span><span class="n">_project</span><span class="o">&amp;</span><span class="s2">&#34;/_apis/git/pullrequests?searchCriteria.includeLinks=true&amp;searchCriteria.status=all&amp;$top=&#34;</span><span class="o">&amp;</span><span class="n">_top</span><span class="o">&amp;</span><span class="s2">&#34;&amp;api-version=7.1-preview.1&#34;</span><span class="p">)),</span><span class="w">
</span></span></span></code></pre></div><p>The <code>Web.Contents</code> function will pull data from the REST API and return a <code>binary</code>. The <code>Json.Document</code> function will grab this binary and parse it to json. After this step, you should have <code>source</code> as a <code>record</code> which has two attributes:</p>
<ul>
<li><code>value</code>: a list of all pull request records.</li>
<li><code>count</code>: the length of the <code>value</code> list.</li>
</ul>
<h3 id="convert-to-table">Convert to Table</h3>
<p>Our previous step resulted in a JSON record containing the pull request data. To make this data available for further analysis, we need to convert it to a table.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="o">#</span><span class="s2">&#34;Converted to Table&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">FromRecords</span><span class="p">(</span><span class="err">{</span><span class="k">Source</span><span class="err">}</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;value&#34;</span><span class="err">}</span><span class="p">),</span><span class="w">
</span></span></span></code></pre></div><p>The above query convert <code>value</code> to a table in Power BI. The returned table has only one column and one row like below:</p>
<table>
  <thead>
      <tr>
          <th>value</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>List</td>
      </tr>
  </tbody>
</table>
<p>To make the table usable, we need to further transform it. First, we want to explode the list to rows</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="o">#</span><span class="s2">&#34;Expanded value&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">ExpandListColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Converted to Table&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value&#34;</span><span class="p">),</span><span class="w">
</span></span></span></code></pre></div><p>And for each row, we want to expand the record to columns. Note that we don&rsquo;t necessarily need all the columns. The M query below extracts only the columns we need.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="o">#</span><span class="s2">&#34;Expanded value1&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">ExpandRecordColumn</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="o">#</span><span class="s2">&#34;Expanded value&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="s2">&#34;value&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">{</span><span class="s2">&#34;repository&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;pullRequestId&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;codeReviewId&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;status&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;createdBy&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;creationDate&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;closedDate&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;title&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;description&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;sourceRefName&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;targetRefName&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;mergeStatus&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;isDraft&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;mergeId&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;reviewers&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;labels&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;url&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;completionOptions&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;supportsIterations&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;completionQueueTime&#34;</span><span class="err">}</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">{</span><span class="s2">&#34;value.repository&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.pullRequestId&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.codeReviewId&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.status&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.createdBy&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.creationDate&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.closedDate&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.title&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.description&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.sourceRefName&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.targetRefName&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.mergeStatus&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.isDraft&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.mergeId&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.reviewers&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.labels&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.url&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.completionOptions&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.supportsIterations&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.completionQueueTime&#34;</span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="p">),</span><span class="w">
</span></span></span></code></pre></div><h3 id="continue-expanding-columns">Continue expanding columns</h3>
<p>Even though the previous steps gave us a solid starting point, some columns still have nested records full of useful data. We will perform additional expansions to access this data.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="cm">/*
</span></span></span><span class="line"><span class="cl"><span class="cm">value.repository, value.createdBy, value.completionOptions are records, we can expand them into columns
</span></span></span><span class="line"><span class="cl"><span class="cm">*/</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">#</span><span class="s2">&#34;Expanded value.repository&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">ExpandRecordColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Expanded value1&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.repository&#34;</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;name&#34;</span><span class="err">}</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;value.repository.name&#34;</span><span class="err">}</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">#</span><span class="s2">&#34;Expanded value.createdBy&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">ExpandRecordColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Expanded value.repository&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.createdBy&#34;</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;displayName&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;id&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;uniqueName&#34;</span><span class="err">}</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;value.createdBy.displayName&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.createdBy.id&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.createdBy.uniqueName&#34;</span><span class="err">}</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">#</span><span class="s2">&#34;Expanded value.completionOptions&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">ExpandRecordColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Expanded value.createdBy&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.completionOptions&#34;</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;mergeCommitMessage&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;mergeStrategy&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;transitionWorkItems&#34;</span><span class="err">}</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;value.completionOptions.mergeCommitMessage&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.completionOptions.mergeStrategy&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;value.completionOptions.transitionWorkItems&#34;</span><span class="err">}</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="cm">/*
</span></span></span><span class="line"><span class="cl"><span class="cm">value.reviewers is otherwise a list of records. For each list, we will concat all displayName of each record
</span></span></span><span class="line"><span class="cl"><span class="cm">*/</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">#</span><span class="s2">&#34;Expanded value.reviewers&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">TransformColumns</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Expanded value.completionOptions&#34;</span><span class="p">,</span><span class="w"> </span><span class="err">{{</span><span class="s2">&#34;value.reviewers&#34;</span><span class="p">,</span><span class="w"> </span><span class="k">each</span><span class="w"> </span><span class="n">Combiner</span><span class="p">.</span><span class="n">CombineTextByDelimiter</span><span class="p">(</span><span class="s2">&#34;, &#34;</span><span class="p">)(</span><span class="n">List</span><span class="p">.</span><span class="k">Transform</span><span class="p">(,</span><span class="w"> </span><span class="k">each</span><span class="w"> </span><span class="p">[</span><span class="n">displayName</span><span class="p">]))</span><span class="err">}}</span><span class="p">),</span><span class="w">
</span></span></span></code></pre></div><h3 id="add-details-from-other-apis">Add details from other APIs</h3>
<p>While the pull request endpoint provides us with a lot of useful information, it might not be enough. We often need to supplement our data with information from other Azure DevOps APIs to gain deeper insights. The process is pretty similar with what we have done so far: pulling data from API and expanding JSON objects.</p>
<h4 id="iterations">Iterations</h4>
<p>Iterations are created as a result of creating and pushing updates to a pull request. The number of iterations is equal to the number of updates made after pull requests are created. Below is the Power BI M query to get the number of iterations for each pull request:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="o">#</span><span class="s2">&#34;Added iterations&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">AddColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Expanded value.reviewers&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;iterations&#34;</span><span class="p">,</span><span class="w"> </span><span class="k">each</span><span class="w"> </span><span class="n">Json</span><span class="p">.</span><span class="n">Document</span><span class="p">(</span><span class="n">Web</span><span class="p">.</span><span class="n">Contents</span><span class="p">([</span><span class="n">value</span><span class="p">.</span><span class="n">url</span><span class="p">]</span><span class="o">&amp;</span><span class="s2">&#34;/iterations/&#34;</span><span class="p">))),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">#</span><span class="s2">&#34;Expanded iterations&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">ExpandRecordColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Added iterations&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;iterations&#34;</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;count&#34;</span><span class="err">}</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;iterations.count&#34;</span><span class="err">}</span><span class="p">),</span><span class="w">
</span></span></span></code></pre></div><h4 id="changes">Changes</h4>
<p>Another good metric to track is the number of files changed in each pull request. And we need to have the changes in all iterations, not just the initial pull request. Below is the code to retrieve the data from the API and extract the required information.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="o">#</span><span class="s2">&#34;Added iterations.changes&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">AddColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Expanded iterations&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;iterations.changes&#34;</span><span class="p">,</span><span class="w"> </span><span class="k">each</span><span class="w"> </span><span class="n">Json</span><span class="p">.</span><span class="n">Document</span><span class="p">(</span><span class="n">Web</span><span class="p">.</span><span class="n">Contents</span><span class="p">([</span><span class="n">value</span><span class="p">.</span><span class="n">url</span><span class="p">]</span><span class="o">&amp;</span><span class="s2">&#34;/iterations/&#34;</span><span class="o">&amp;</span><span class="nb">Number</span><span class="p">.</span><span class="n">ToText</span><span class="p">([</span><span class="n">iterations</span><span class="p">.</span><span class="k">count</span><span class="p">])</span><span class="o">&amp;</span><span class="s2">&#34;/changes?api-version=7.1-preview.1&#34;</span><span class="p">))),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">#</span><span class="s2">&#34;Expanded iterations.changes&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">ExpandRecordColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Added iterations.changes&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;iterations.changes&#34;</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;changeEntries&#34;</span><span class="err">}</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;iterations.changes.changeEntries&#34;</span><span class="err">}</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">#</span><span class="s2">&#34;Added iterations.changes.changeEntries.count&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">AddColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Expanded iterations.changes&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;iterations.changes.changeEntries.count&#34;</span><span class="p">,</span><span class="w"> </span><span class="k">each</span><span class="w"> </span><span class="n">List</span><span class="p">.</span><span class="k">Count</span><span class="p">([</span><span class="n">iterations</span><span class="p">.</span><span class="n">changes</span><span class="p">.</span><span class="n">changeEntries</span><span class="p">])),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">#</span><span class="s2">&#34;Removed iterations.changes.changeEntries&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">RemoveColumns</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Added iterations.changes.changeEntries.count&#34;</span><span class="p">,</span><span class="err">{</span><span class="s2">&#34;iterations.changes.changeEntries&#34;</span><span class="err">}</span><span class="p">),</span><span class="w">
</span></span></span></code></pre></div><h4 id="threads">Threads</h4>
<p>Threads are an Azure DevOps object for managing and organizing pull request discussions. Team can discuss specific changes directly by adding one or more comments to each thread. Analyzing threads can give us many useful insights.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="o">#</span><span class="s2">&#34;Added threads&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">AddColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Removed iterations.changes.changeEntries&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;threads&#34;</span><span class="p">,</span><span class="w"> </span><span class="k">each</span><span class="w"> </span><span class="n">Json</span><span class="p">.</span><span class="n">Document</span><span class="p">(</span><span class="n">Web</span><span class="p">.</span><span class="n">Contents</span><span class="p">([</span><span class="n">value</span><span class="p">.</span><span class="n">url</span><span class="p">]</span><span class="o">&amp;</span><span class="s2">&#34;/threads?api-version=7.1-preview.1&#34;</span><span class="p">))),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="o">#</span><span class="s2">&#34;Expanded threads&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">ExpandRecordColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Added threads&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;threads&#34;</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;value&#34;</span><span class="err">}</span><span class="p">,</span><span class="w"> </span><span class="err">{</span><span class="s2">&#34;threads.value&#34;</span><span class="err">}</span><span class="p">),</span><span class="w">
</span></span></span></code></pre></div><p>For example, we can count the comment threads. A comment thread should have the <code>status</code> attribute (<code>Active</code>, <code>Resolved</code>, <code>Closed</code>)</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="o">#</span><span class="s2">&#34;Added threads.value.commentCount&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">AddColumn</span><span class="p">(</span><span class="o">#</span><span class="s2">&#34;Expanded threads&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;threads.value.commentCount&#34;</span><span class="p">,</span><span class="w"> </span><span class="k">each</span><span class="w"> </span><span class="n">List</span><span class="p">.</span><span class="k">Sum</span><span class="p">(</span><span class="n">List</span><span class="p">.</span><span class="k">Transform</span><span class="p">([</span><span class="n">threads</span><span class="p">.</span><span class="n">value</span><span class="p">],</span><span class="w"> </span><span class="k">each</span><span class="w"> </span><span class="nb">Number</span><span class="p">.</span><span class="k">From</span><span class="p">(</span><span class="n">Record</span><span class="p">.</span><span class="n">HasFields</span><span class="p">(</span><span class="n">_</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;status&#34;</span><span class="p">))))),</span><span class="w">
</span></span></span></code></pre></div><p>Or we can get the approval or rejection information from the vote thread. A vote thread has a <code>CodeReviewThreadType</code> attribute with value <code>VoteUpdate</code>. And if the value of <code>CodeReviewVoteResult</code> is greater than 0, it is an approval. Otherwise, it is a rejection. The below M query get the fist approval time of a pull request.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="o">#</span><span class="s2">&#34;Added threads.value.firstApprovalTime&#34;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Table</span><span class="p">.</span><span class="n">AddColumn</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="o">#</span><span class="s2">&#34;Added threads.value.commentCount&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="s2">&#34;threads.value.firstApprovalTime&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">each</span><span class="w"> </span><span class="n">List</span><span class="p">.</span><span class="k">Min</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">List</span><span class="p">.</span><span class="k">Transform</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="p">[</span><span class="n">threads</span><span class="p">.</span><span class="n">value</span><span class="p">],</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="k">each</span><span class="w"> </span><span class="k">if</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="n">Record</span><span class="p">.</span><span class="n">HasFields</span><span class="p">(</span><span class="n">_</span><span class="p">[</span><span class="n">properties</span><span class="p">],</span><span class="w"> </span><span class="s2">&#34;CodeReviewThreadType&#34;</span><span class="p">)</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">Record</span><span class="p">.</span><span class="n">Field</span><span class="p">(</span><span class="n">_</span><span class="p">[</span><span class="n">properties</span><span class="p">][</span><span class="n">CodeReviewThreadType</span><span class="p">],</span><span class="w"> </span><span class="s2">&#34;$value&#34;</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">&#34;VoteUpdate&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="k">and</span><span class="w"> </span><span class="n">Record</span><span class="p">.</span><span class="n">HasFields</span><span class="p">(</span><span class="n">_</span><span class="p">[</span><span class="n">properties</span><span class="p">],</span><span class="w"> </span><span class="s2">&#34;CodeReviewVoteResult&#34;</span><span class="p">)</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="nb">Number</span><span class="p">.</span><span class="n">FromText</span><span class="p">(</span><span class="n">Record</span><span class="p">.</span><span class="n">Field</span><span class="p">(</span><span class="n">_</span><span class="p">[</span><span class="n">properties</span><span class="p">][</span><span class="n">CodeReviewVoteResult</span><span class="p">],</span><span class="w"> </span><span class="s2">&#34;$value&#34;</span><span class="p">))</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="k">then</span><span class="w"> </span><span class="n">_</span><span class="p">[</span><span class="n">publishedDate</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="k">else</span><span class="w"> </span><span class="k">null</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="p">),</span><span class="w">
</span></span></span></code></pre></div><h2 id="full-source-code">Full source code</h2>
<p>You can grab the source code, paste it into the Power BI Power Query advanced editor, and customize it to suit your needs.</p>
<p><a href="https://gist.github.com/ThaiDat/9aa1f08ea1a1339973566325b1cf9af9">Full Query</a></p>
<h2 id="visualize-insights">Visualize insights</h2>
<p>Now we have a rich dataset. Power BI offers a wide range of visual elements to help you uncover trends, patterns, and insights. It&rsquo;s time to bring our data to life with stunning visualizations.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Remember, this is just the beginning. As your project evolves and your data grows, you can expand your report to include additional metrics, refine visualizations, and explore new insights. Continuous improvement is essential to maximizing the value of your data.</p>
<p>By creating a comprehensive pull request report, you are taking the initial step toward establishing a culture of data-driven decision-making, first within your development team, then throughout your organization.</p>
]]></content:encoded></item><item><title>How to start a successful Data Warehouse project</title><link>https://note.datengineer.dev/posts/how-to-build-a-successful-data-warehouse-project/</link><pubDate>Sun, 11 Aug 2024 00:00:00 +0000</pubDate><guid>https://note.datengineer.dev/posts/how-to-build-a-successful-data-warehouse-project/</guid><description>In this article on the key factors for launching a successful Data Warehouse project, we will explore key considerations that can help ensure that your Data Warehouse achieves its intended goals and delivers value to the organization.</description><content:encoded><![CDATA[<p>Any organization aiming to leverage the power of data-driven decision-making stands to benefit greatly from a successful Data Warehouse project. A well-designed Data Warehouse not only centralizes your data but also guarantees that it is reliable, scalable, maintainable, and usable by stakeholders.</p>
<p>Over the past few months, my team and I have launched a new Data Warehouse project in production. The opportunity to start from scratch is always a valuable chance to gain new insights and expertise. I would like to share the experiences from this success story in the hope that they will be as beneficial to others as they have been to us.</p>
<h2 id="understand-business-requirements">Understand Business Requirements</h2>
<p>The first step in starting any project, not only a Data Warehouse, is to fully understand the business requirements. This is the difference between success and failure, not just a formality. If you skip this step, I can tell you with certainty that your project will be a waste of time, energy, and resources.</p>
<p>To really understand what business wants to see and what your team needs to do, it&rsquo;s essential to spend time talking to the people who will be using the Data Warehouse. What do they hope to accomplish? How will it help them do their jobs better? How do they plan to use the data? Getting a clear picture of their goals is essential to making sure your project is on the right track.</p>
<p><img alt="Importance of a clear requirements in Data Warehouse project success" loading="lazy" src="/posts/how-to-build-a-successful-data-warehouse-project/images/business-requirements.png"></p>
<p>However, this is where things often get complicated. People usually do not understand each other, especially people in different departments who have different perspectives, priorities, and terminologies. <strong>Sometimes people do not even understand what they are saying.</strong> Business guys are the ones who are easily attracted to marketing buzzwords on the Internet believing that these terms are the solutions to their problems. I have to say that the marketing departments of data companies do a really good job of re-inventing new names for the similar term. During this project, there were dozens of times the guys told me let&rsquo;s use this tool, why not use this technology, money is not a problem (until they actually got the bill).</p>
<p>In one of my previous projects, a stakeholder told me that he wanted a visually stunning real-time dashboard that would make the numbers dance instantly whenever users did something in the web application. And I had to explain to him:</p>
<ul>
<li>Visually stunning: Yes, the data analysts team can always help you with that.</li>
<li>Real-time: There is no real time. If the sun disappears, we can know it only after 8 minutes. So does the data.</li>
<li>We do not really need it. Business is not going to sit still and watch the numbers dance every second.</li>
</ul>
<p>Patience is the key. They do not understand those technical buzzwords. Yes. But isn&rsquo;t that why you are here as a technical specialist? Your responsibility is to listen to them, understand them, empathize with them, and tell them what you will do to help them. Your job is to translate their requirements into a workable solution.</p>
<p>Remember that the business stakeholders are not only the end users, but also the investors. Without their buy-in, the project can&rsquo;t even get off the ground. They are funding the project, and they deserve the best service.</p>
<p>By starting with a clear understanding of business requirements, you set the stage for a Data Warehouse project that is aligned with the organization&rsquo;s goals, ensuring that the final product delivers real value.</p>
<h2 id="understand-system">Understand System</h2>
<p>A Data Warehouse is not an isolated island. It is more like a bustling city that relies on a network of interconnected systems. It receives supplies from surrounding farms and industrial areas. Since Data Warehouse pulls data from other systems, you can not build a successful Data Warehouse without understanding how the other systems work.</p>
<p>Imagine stakeholders telling you they want the sales figure. Then you need to know exactly which systems hold the sales number. How is that number populated in each system? It may be manually entered by users, it may be automatically calculated, it may be synchronized from other sources, it may be read-only or editable&hellip; You need to know all the surrounding information to decide the source of truth for the number we desire. You may argue that all you need to do is copy the source database over and the business will know what to do with the data. Believe me, they don&rsquo;t. In fact, they have never seen the database a day in their lives. And you are the one who will tell them what they can do with your Data Warehouse.</p>
<p>And not knowing how the system works also risks your project design. You certainly don&rsquo;t want to discover a surprise when you&rsquo;re almost done with the implementation, such as a scheduled job that archives data from the database daily. If you had known that from the beginning, your design would have been very different.</p>
<p>Understanding the entire system in detail can be time-consuming. You should have a good sense of how the interconnected systems work together, but don&rsquo;t expect to understand them in detail at the beginning of your project. Instead, I would suggest building strong relationships with the teams responsible for maintaining these systems. Meet with them, tell them what you are doing, and ask for their advice and insights. They are a goldmine of information. You can also experiment with sandbox environments and databases to uncover hidden patterns and processes.</p>
<h2 id="design-a-reliable-data-warehouse">Design a reliable Data Warehouse</h2>
<p><a href="../what-is-a-reliable-data-system">Reliability is the backbone of any Data Warehouse</a>. If your business can&rsquo;t rely on the data coming out of your Data Warehouse, your project is completely a failure.</p>
<p>Having a solid testing strategy will greatly help. Testing is not just about finding bugs, it&rsquo;s about building confidence. When you start designing the data warehouse, think less about the time when the system is running happily, there is nothing for us to do if the system keeps running as it should. Think more about the time when the system is not working and what we are going to do in that time.</p>
<p><img alt="Bug is inevitable. The importance is how you deal with it." loading="lazy" src="/posts/how-to-build-a-successful-data-warehouse-project/images/there-will-be-no-bug-if-you-dont-write-any-code.png"></p>
<p>And even if you do your best, bugs and issues will still happen. Don&rsquo;t expect your system to be bug-free; instead, build processes to handle issues as soon as they arise. And most importantly, be transparent. If the business comes to you and asks about an issue they found, tell them what happened and what you are doing to help. Transparency is the key to trust. <strong>If you tell a lie, you are part of the problem; if you are transparent, you are part of the solution.</strong> A reliable Data Warehouse isn&rsquo;t just about technology. It&rsquo;s about building trust.</p>
<h2 id="choose-the-right-tool-for-the-right-job">Choose the right tool for the right job</h2>
<p>To build a Data Warehouse, you need a toolbox filled with different pieces to complete the picture: tools for copying data, transforming it, orchestrating jobs, and more. It is technically possible to create the tools yourself, especially if you are in a big corporation and want to control every aspect of the technology. However, in most of the cases, it is impractical. You do not have enough resources to own the technology. Thus, developing a Data Warehouse solution usually means picking the available tools and services and making them work together.</p>
<p>The real challenge is choosing the right tools. Beware of your enemies, the shiny marketing promises. The person who writes those buzzwords may not be the one who writes the code. Sometimes I don&rsquo;t understand what they wrote, and I think they don&rsquo;t understand what they wrote either. These tools are very expensive. It is important to avoid overkill. Focus on what your business really needs, not just what sounds cool. We are not going to use the most popular or the most expensive tools; we are going to find the right fit for our specific needs.</p>
<h2 id="start-small-grow-big">Start small, Grow big</h2>
<p>Your investors do not have infinite patience. They want to see progress and value. Building something small but functional is far better than promising a grand project that never finish. By starting small, you can quickly deliver value and gather feedback from users.</p>
<p>With limited resources, we can not get everything done at once. It is important to prioritize. What matters most to your business? What will deliver the biggest impact to your customers? Concentrate on delivering those core features first. You can break the project into phases, which is a good practice. Each phase focuses on specific business requirements, data sources, or user groups. And you can gradually expand the capabilities of the Data Warehouse.</p>
<h2 id="engage-users">Engage users</h2>
<p>A Data Warehouse is not just a technical marvel. It is a tool for your business. To ensure it delivers maximum value, you need to involve your users from the very beginning.</p>
<p>Imagine building a house without consulting the people who will live in it. People can still live in it, but they never feel it is their home. By involving them early and often, you will gain valuable insight into their needs, expectations, and challenges.</p>
<p>How can you engage your users?</p>
<ul>
<li>Involve them in the planning phase: Understand their data needs, pain points, and desired outcomes.</li>
<li>Provide regular updates: Keep them informed about project progress and involve them in decision-making.</li>
<li>Offer training and support: Equip users with the skills to effectively use the Data Warehouse.</li>
<li>Gather feedback: Encourage users to share their thoughts and suggestions for improvement.</li>
</ul>
<p>Remember that if you can not engage your users, any slightly higher number in their reports will quickly become <strong>your</strong> problem. <strong>If you can engage them and make them feel like they are part of the project, then any issue will become everyone&rsquo;s problem.</strong></p>
<h2 id="conclusion">Conclusion</h2>
<p>Building a successful Data Warehouse is a challenging journey that requires careful planning, execution, and continuous improvement. It all starts with a deep understanding of the business requirements to ensure that every decision is aligned with the organization&rsquo;s goals. Start small, iterate often, and always keep the user at the center of your efforts. A successful Data Warehouse is a collaboration between the engineering team and the business. By working together, you can create a solution that truly delivers value.</p>
]]></content:encoded></item></channel></rss>