- Article
- 25 minutes to read
APPLIES TO: Azure Data Factory
Azure Synapse Analysis
The integration runtime (IR) is the computing infrastructure used by the Azure Data Factory and Synapse pipelines to provide data integration capabilities across various network environments. For more information about IR, seeIntegration runtime summary.
A self-hosted integration runtime can perform copy activities between a data store in the cloud and a data store on a private network. You can also send transform activities to compute resources on an on-premises network or in an Azure virtual network. Installing a self-hosted Integration Runtime requires a local computer or a virtual machine on a private network.
This article describes how to create and configure a self-hosted IR.
to use
We recommend that you use the Azure Az PowerShell module to interact with Azure. To seeInstall Azure PowerShellto start. For information on migrating to the Az PowerShell module, seeMigrating Azure PowerShell from AzureRM to Az.
Considerations for using a self-hosted IR
- You can use a single, self-hosted integration runtime for multiple on-premises data sources. You can also share it with another data factory in the same Azure Active Directory (Azure AD) tenant. For more information, seeSharing a self-hosted integration runtime.
- You can install only one instance of the self-hosted integration runtime on a single computer. If you have two data factories that need access to local data sources, use theself-hosted IR sharing featureto share the self-hosted IR or install the self-hosted IR on two local machines, one for each Data Factory or Synapse workspace. Synapse Workspace does not support integration runtime sharing.
- The self-hosted integration runtime does not need to be on the same machine as the data source. However, when the self-hosted Integration Runtime is close to the data source, the time it takes for the self-hosted Integration Runtime to connect to the data source is reduced. We recommend installing the self-hosted Integration Runtime on a different computer than the one hosting the local data source. If the self-hosted Integration Runtime and the data source are on different machines, the self-hosted Integration Runtime will not compete with the data source for resources.
- You can have multiple self-hosted integration runtimes on different machines that connect to the same local data source. For example, if you have two self-hosted integration runtimes serving two data factories, the same local data source can be registered with both data factories.
- Use a self-hosted integration runtime to support data integration in an Azure virtual network.
- Treat your data source as an on-premises data source behind a firewall, even if you use Azure ExpressRoute. Use the self-hosted Integration Runtime to connect the service to the data source.
- Use the self-hosted integration runtime even when your data warehouse is in the cloud on an Azure IaaS (Infrastructure as a Service) virtual machine.
- Tasks can fail on a self-hosted integration runtime installed on a Windows server with FIPS-compliant encryption enabled. To work around this issue, you have two options: store credentials/secrets in Azure Key Vault or disable FIPS compliant encryption on the server. To disable FIPS compliant encryption, change the value of the following registry subkey from 1 (enabled) to 0 (disabled):
HKLM\Sistema\CurrentControlSet\Control\Lsa\FIPSAlgorithmPolicy\Enabled
. If you use theself-hosted integration runtime as a proxy for the SSIS integration runtime, FIPS compliant encryption can be enabled and is used when moving data from on-premises to Azure Blob storage as a staging area. - For complete licensing details, see the first page of the self-hosted integration runtime setup.
to use
Currently, the self-hosted integration runtime can only be shared across multiple datafactories, it cannot be shared across Synapse desktops or between the datafactory and Synapse desktop.
instruction flow and data flow
When moving data between on-premises and the cloud, the activity uses a self-hosted integration runtime to transfer the data between an on-premises data source and the cloud.
Here is a high-level summary of the data flow steps to copy with a self-hosted IR:
A data developer first creates a self-hosted integration runtime in an Azure Data Factory or Synapse workspace using the Azure portal or PowerShell cmdlet. The data developer creates a linked service for a local data store and specifies the self-hosted integration runtime that the service should use to connect to the data stores.
The self-hosted integration runtime node encrypts the credentials using the Windows Data Protection Application Programming Interface (DPAPI) and stores the credentials locally. When multiple nodes are configured for high availability, credentials continue to be synchronized between other nodes. Each node encrypts credentials using DPAPI and stores them locally. Credential synchronization is transparent to the data developer and is handled by the self-hosted IR.
Azure Data Factory and Synapse pipelines communicate with the self-hosted integration runtime to schedule and manage jobs. Communication takes place over a control channel using aAzure relayConnection. When an activity job needs to run, the service queues the request along with the credentials. This will happen if the credentials are not already stored in the self-hosted integration runtime. The self-hosted Integration Runtime starts the job after polling the queue.
The self-hosted integration runtime copies data between local storage and cloud storage. The copy direction depends on how the copy activity is configured in the data pipeline. For this step, the self-hosted integration runtime communicates directly with cloud-based storage services such as Azure Blob Storage over a secure HTTPS channel.
requirements
- Supported Windows versions are:
- Windows 8.1
- windows 10
- ventanas11
- Windows Server 2012
- Servidor Windows 2012 R2
- Windows Server 2016
- Windows Server 2019
- Windows Server 2022
Installing the self-hosted Integration Runtime on a domain controller is not supported.
- Self-hosted Integration Runtime requires a 64-bit operating system with .NET Framework 4.7.2 or higher. To seeSystem requirements for the .NET Frameworkfor details.
- The recommended minimum configuration for the self-hosted integration runtime machine is a 2 GHz processor with 4 cores, 8 GB of RAM, and 80 GB of available disk space. For details on system requirements, seeTo go down.
- When the host machine goes to sleep, the self-hosted integration runtime does not respond to requests for data. Configure an appropriate power plan on the computer before installing the self-hosted Integration Runtime. If the computer is configured to sleep, the Integration Runtime self-hosted installer displays a message.
- You must be an administrator on the computer to successfully install and configure the self-hosted Integration Runtime.
- Copy activities are performed with a certain frequency. Processor and RAM usage on the computer follows the same pattern with peak and idle times. Resource usage is also highly dependent on the amount of data being moved. If multiple copy jobs are running, you will see an increase in resource usage during peak hours.
- Tasks may fail to extract data in Parquet, ORC, or Avro formats. You can find more information about parquet atFormato parquet no Azure Data Factory. File creation is done on the self-hosted integration machine. For file creation to work as expected, the following prerequisites are required:
- Visual C++ 2010 Redistributablepackage (x64)
- Java Runtime (JRE) version 11 of a JRE provider such asVerfinstere Temurin. Make sure the JAVA_HOME environment variable is set to the JDK folder (not just the JRE folder). You may also need to add the bin folder to your system's PATH environment variable.
to use
In case of memory errors, it may be necessary to adjust the Java configuration as described inparquet formatDocumentation.
to use
If you work in the government cloud, seeConnect to the government cloud.
Configuring a self-hosted integration runtime
Use the following procedures to create and configure a self-hosted integration runtime.
Create a self-hosted IR via Azure PowerShell
You can use Azure PowerShell for this task. Here is an example:
Set-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntimeName -Type SelfHosted -Description "descrição IR selfhosted"
To go downand install the self-hosted Integration Runtime on a local machine.
Get the authentication key and register the self-hosted integration runtime with the key. Here is an example from PowerShell:
Get-AzDataFactoryV2IntegrationRuntimeKey -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntimeName
to use
Run PowerShell command in Azure Government, seeConnect to Azure Government with PowerShell.
Create a self-hosted IR from the UI
Complete the following steps to create a self-hosted IR using Azure Data Factory or the Azure Synapse UI.
- Azure Data Factory
- Azure-Synapse
On the Azure Data Factory UI home page, select themanage tabof the left panel.
To chooseintegration runtimesin the left pane, and then select+Neu.
About itConfiguring the Integration Runtimepage, selectazure self-hostedand then selectContinue.
Make your selection on the next pageself-hostedto create a self-hosted IR and selectContinue.
Configuring a self-hosted IR from the UI
Enter a name for your IR and selectTo create.
About itConfiguring the Integration Runtimepage, select or link belowOption 1to open Express Setup on your computer. Or follow the steps belowoption 2set manually. The following instructions are based on manual configuration:
Copy and paste the authentication key. To chooseDownload and install the integration runtime.
Download the self-hosted Integration Runtime onto a local Windows machine. Run the installer.
About itRegister Integration Runtime (self-hosted)page, paste the previously saved key and select itcheck-in.
About itNew integration runtime node (self-hosted)page, selectfin.
After the self-hosted Integration Runtime is successfully registered, you will see the following window:
Configure a self-hosted IR on an Azure VM using an Azure Resource Manager template
You can automate self-hosted IR configuration on an Azure virtual machine using theCreate your own Host IR template. The template provides an easy way to have a fully functional self-hosted IR in an Azure virtual network. IR has high availability and scalability features as long as you set the number of nodes to 2 or more.
Configure an existing self-hosted IR using local PowerShell
You can use a command line to configure or manage an existing self-hosted IR. In particular, this usage can help automate the installation and registration of self-hosted IR nodes.
Dmgcmd.exe is included in the self-hosted installer. It is usually located in the C:\Program Files\Microsoft Integration Runtime\4.0\Shared\ folder. This application supports multiple parameters and can be called from a command line with batch scripts for automation.
Use the app as follows:
dmgcmd ACTION-Argument...
Here are the details on the application's actions and arguments:
ACTION | argument | Description |
---|---|---|
-rn ,-RegisterNewNode | "<authentication key> "["<node name> "] | Register a self-hosted integration runtime node with the specified authentication key and node name. |
-Epochs ,-Enable remote access | "<porto> "["<thumbprint> "] | Enable remote access on the current node to set up a highly available cluster. Or, enable credential configuration directly in the self-hosted IR without going through an Azure Data Factory or Azure Synapse workspace. You do the latter using theNew-AzDataFactoryV2LinkedServiceEncryptedCredentialCmdlet from a remote computer on the same network. |
- he was ,-EnableRemoteAccessInContainer | "<porto> "["<thumbprint> "] | Enable remote access to the current node if the node is running in a container. |
-dra ,- Disable remote access | Disable remote access on the current node. Remote access is required for multi-node configuration. HeNew-AzDataFactoryV2LinkedServiceEncryptedCredentialThe PowerShell cmdlet continues to work even if remote access is disabled. This behavior applies as long as the cmdlet runs on the same computer as the self-hosted IR node. | |
-k ,-I like | "<authentication key> " | Replace or update the old authentication key. Be careful with this action. Your old self-hosted IR node can go offline if the key is from a new integration runtime. |
-gbf ,-Generate backup file | "<file path> " "<password> " | Generate a backup file for the current node. The backup file contains the node key and datastore credentials. |
-ibf ,-Import backup file | "<file path> " "<password> " | Restore the node from a backup file. |
-R ,-Starting over | Restart the self-hosted Integration Runtime host service. | |
-S ,-To start | Start the self-hosted Integration Runtime host service. | |
-T ,-Fuses | Stop the self-hosted Integration Runtime host service. | |
-they are ,-StartUpgradeService | Start the update service for the self-hosted integration runtime. | |
- the only ,-Stop the update service | Stop the self-hosted Integration Runtime update service. | |
-tons ,-Enable automatic update | Enable automatic updating of the self-hosted integration runtime. This command only applies to Azure Data Factory V1. | |
-toffau ,-Disable automatic update | Disable automatic updating of the self-hosted integration runtime. This command only applies to Azure Data Factory V1. | |
-Em ,-Change service account | "<domain\user> "["<password> "] | Configure the DIAHostService to run as a new account. Use blank password "" for system accounts and virtual accounts. |
-elma ,-EnableLocalMachineAccess | Enable local machine access (localhost, private IP) on the current self-hosted IR node. In a self-hosted IR high availability scenario, the action must be invoked on each self-hosted IR node. | |
-Dlma ,-Disable local machine access | Disable local computer access (localhost, private IP) on the current self-hosted IR node. In a self-hosted IR high availability scenario, the action must be invoked on each self-hosted IR node. | |
-Disable LocalFolderPathValidation | Disable security validation to allow access to the local computer's file system. | |
-EnableLocalFolderPathValidation | Enable security checking to disable access to the local computer's file system. | |
-esp ,-EnableExecuteSsisPackage | Enable SSIS package execution on a self-hosted IR node. | |
-desp ,-DisableExecuteSsisPackage | Disable the execution of the SSIS package on the self-hosted IR node. | |
-Belt buckle ,-GetExecuteSsisPackage | Get the value if the ExecuteSsisPackage option is enabled on the self-hosted IR node. |
Install and register a self-hosted IR from the Microsoft Download Center
OrSite de download do Microsoft Integration Runtime.
To chooseTo go down, select the 64-bit version and selectNext. The 32-bit version is not supported.
Run the MSI file directly or save it to your hard drive and run it.
About itWelcomewindow, choose a language and selectNext.
Accept the Microsoft Software License Terms and selectNext.
To choosechainerto install the self-hosted Integration Runtime and selectNext.
About itsoon to installpage, selectinstall.
To choosefinto complete the installation.
Get the authentication key using PowerShell. Here is a PowerShell example to get the authentication key:
Get-AzDataFactoryV2IntegrationRuntimeKey -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntimeName
About itRegister Integration Runtime (self-hosted)In the Microsoft Integration Runtime Configuration Manager window running on your computer, do the following:
Paste the authentication key into the text area.
select optionalshow authentication keyto see the key text.
To choosecheck-in.
to use
Release notes are available at the sameSite de download do Microsoft Integration Runtime.
Service account for self-hosted integration runtime
The default self-hosted Integration Runtime login service account isNT-SERVICE\DIAHostService. you can see thereServices -> Integration Runtime Service -> Properties -> Login.
Make sure the account has permission to log on as a service. Otherwise, the self-hosted integration runtime cannot start correctly. You can check the permissionLocal Security Policy -> Security Settings -> Local Policies -> User Rights Assignment -> Log on as a service
Notification area icons and notifications
By hovering over the icon or message in the notification area, you can view details about the status of the self-hosted integration runtime.
High availability and scalability
You can attach a self-hosted integration runtime to multiple on-premises machines or virtual machines in Azure. These machines are called nodes. You can have up to four nodes associated with a self-hosted integration runtime. The advantages of multiple nodes on local machines where a gateway to a logical gateway is installed are:
- Increased availability of the self-hosted integration runtime so it is no longer the single point of failure in your big data or cloud data integration solution. This availability helps ensure continuity when up to four nodes are used.
- Improved throughput and performance when moving data between on-premises and cloud data stores. For more information, seeperformance comparisons.
You can map multiple nodes by installing the self-hosted Integration Runtime softwaredownload center. Then register it using one of the authentication keys you received fromNew-AzDataFactoryV2IntegrationRuntimeKeycmdlet as described intutorial.
to use
You don't need to create a new self-hosted integration runtime to allocate each node. You can install the self-hosted Integration Runtime on another computer and register it with the same authentication key.
to use
Before adding another node for high availability and scalability, make sure theRemote intranet accessThe option is enabled on the first node. To do this, selectMicrosoft Integration Runtime Configuration Manager>Ideas>Remote intranet access.
scale considerations
climb
If processor utilization is high and available memory is low on the self-hosted IR, add a new node to scale the load across multiple machines. When activities fail due to timeout or the self-hosted IR node is offline, it is helpful to add a node to the gateway.
Extend
If the processor and available memory are not being used well, but running concurrent tasks is reaching the limits of a node, scale out by increasing the number of concurrent tasks a node can run. You can also raise when activities expire because the self-hosted IR is overloaded. As shown in the following figure, you can increase the maximum capacity of a node:
TLS/SSL certificate requirements
If you want to enable remote intranet access with TLS/SSL (enhanced) certificate to secure communication between Integration Runtime nodes, follow the steps inEnable remote intranet access with TLS/SSL certificate.
to use
This certificate is used:
- To encrypt ports on a self-hosted IR node.
- For node-to-node communication for state synchronization, including inter-node synchronization of linked service credentials.
- Using a PowerShell cmdlet to link service credentials on a local network.
We recommend that you use this certificate if your private network environment is not secure or if you want to secure the communication between nodes in your private network.
Data transfer when moving from a self-hosted IR to other data stores is always done over an encrypted channel, regardless of whether this certificate is specified or not.
sync credentials
If you don't store the credentials or secrets in Azure Key Vault, the credentials or secrets are stored on the computers where your self-hosted integration runtime resides. Each node has a copy of the credentials with a specific version. For all nodes to work together, the version number must be the same for all nodes.
Proxy server considerations
If your corporate network environment uses a proxy server to access the Internet, configure the self-hosted Integration Runtime to use the appropriate proxy settings. You can configure the proxy during the initial registration phase.
When configured, the self-hosted Integration Runtime uses the proxy server to connect to the cloud service source and destination (which use HTTP or HTTPS protocol). so choosechange linkduring initial setup.
There are three configuration options:
- Do not use a proxyNote: The self-hosted integration runtime does not use a proxy to connect to cloud services.
- Use system proxy: The self-hosted Integration Runtime uses the proxy settings defined in diahost.exe.config and diawp.exe.config. If these files don't specify a proxy configuration, the self-hosted integration runtime connects directly to the cloud service without using a proxy.
- Use custom proxy– Configure the HTTP proxy settings to be used for the self-hosted integration runtime instead of using the settings in diahost.exe.config and diawp.exe.config.ADDRESSYportoValues are required.user nameYpasswordValues are optional depending on your proxy authentication settings. All settings are encrypted with Windows DPAPI on the self-hosted integration runtime and are stored locally on your computer.
The Integration Runtime host service restarts automatically after saving the updated proxy settings.
If you want to view or update proxy settings after registering the self-hosted Integration Runtime, use the Microsoft Integration Runtime Configuration Manager.
- OpenMicrosoft Integration Runtime Configuration Manager.
- To chooseIdeasEyelash.
- LowProxy HTTP, to chooseTo changelink to openDefinir o proxy HTTPdialog box.
- To chooseNext. A prompt appears asking for permission to save the proxy settings and restart the Integration Runtime Host service.
You can use the Configuration Manager tool to view and update the HTTP proxy.
to use
If you configure a proxy server with NTLM authentication, the Integration Runtime host service runs under the domain account. If you later change the domain account password, remember to update the service's configuration settings and restart the service. Due to this requirement, we recommend that you access the proxy server using a dedicated domain account that does not require frequent password updates.
Configure proxy server settings
If you choose theUse system proxyHTTP proxy option, self-hosted integration runtime uses proxy settings in diahost.exe.config and diawp.exe.config. If these files don't specify a proxy, the self-hosted integration runtime connects directly to the cloud service without using a proxy. The following procedure provides instructions for updating the diahost.exe.config file:
In File Explorer, make a safe copy of C:\Program Files\Microsoft Integration Runtime\4.0\Shared\diahost.exe.config as a backup of the original file.
Open Notepad running as an administrator.
In Notepad, open the text file C:\Programs\Microsoft Integration Runtime\4.0\Shared\diahost.exe.config.
find the default valuesystem.netlabel as shown no code to follow:
<sistema.net> <proxy predeterminado useDefaultCredentials="true" /></sistema.net>
You can add the proxy server details as shown in the following example:
<system.net> <default proxy enabled="true"> <proxy bypassonlocal="true" proxy address="http://proxy.domain.org:8888/" /> </default proxy></system.net >
The proxy tag allows for additional properties to specify required settings such as
traitLocation
. ver<proxy> element (network settings)for the syntax.<proxy autoDetect="true|false|unspecified" bypassonlocal="true|false|unspecified" proxyaddress="uriString" scriptLocation="uriString" usesystemdefault="true|false|unspecified"/>
Save the configuration file in its original location. Then restart the self-hosted Integration Runtime host service, which will apply the changes.
To restart the service, use the Services applet in Control Panel. Or select from Integration Runtime Configuration Managerservice stopbutton and then selectservice starts.
If the service does not start, you have probably added incorrect XML tag syntax in the edited application configuration file.
Important
Don't forget to update diahost.exe.config and diawp.exe.config.
You should also ensure that Microsoft Azure is whitelisted for your organization. You can download the list of valid IP addresses from Azure. The IP ranges for each cloud broken down by region and the services tagged in that cloud are now available in MS Download:
- Public:https://www.microsoft.com/download/detalles.aspx?id=56519
- United States government:https://www.microsoft.com/download/detalles.aspx?id=57063
- Germany:https://www.microsoft.com/download/detalles.aspx?id=57064
- Porcelain:https://www.microsoft.com/download/detalles.aspx?id=57062
Possible symptoms of firewall and proxy server issues
If you see error messages like the following, the likely cause is an incorrect firewall or proxy server configuration. This setting prevents the self-hosted integration runtime from connecting to Data Factory or Synapse pipelines to authenticate. To ensure that the firewall and proxy server are configured correctly, refer to the previous section.
When you try to register the self-hosted Integration Runtime, you receive the following error message: "Error registering this Integration Runtime node! Please confirm that the authentication key is valid and that the Integration Service Host service is running." running on this computer".
When you open the Integration Runtime Configuration Manager, you see a status ofPO BoxoTo connect. If you see the Windows event logs, underevent viewer>Application and service logs>Microsoft-Integracioneslaufzeit, you will see error messages like this:
Unable to connect to remote server An Integration Runtime component stops responding and automatically restarts. Component name: Integration Runtime (self-hosted).
Enable remote access over an intranet
If you are using PowerShell to encrypt the credentials of a network computer other than the one that installed the self-hosted Integration Runtime, you can use theRemote access over the intranetPossibility. If you run PowerShell to encrypt credentials on the computer where you installed the self-hosted Integration Runtime, you will not be able to activate itRemote access over the intranet.
make it possibleRemote access over the intranetbefore adding another node for high availability and scalability.
If you run self-hosted Integration Runtime Setup version 3.3 or later, the self-hosted Integration Runtime installer is disabled by default.Remote access over the intraneton the self-hosted integration runtime machine.
If you are using a partner or third party firewall, you can manually open port 8060 or user configured port. If you encounter a firewall issue while configuring the self-hosted Integration Runtime, use the following command to install the self-hosted Integration Runtime without configuring the firewall:
msiexec /q /i IntegrationRuntime.msi NOFIREWALL=1
If you choose not to open port 8060 on the self-hosted integration runtime machine, use mechanisms other than the Credentials Setup application to configure datastore credentials. For example, you can use theNew-AzDataFactoryV2LinkedServiceEncryptCredentialPowerShell-Cmdlet.
Ports and Firewalls
There are two firewalls to consider:
- Hecorporate firewallrunning on the organization's central router
- HeFirewall do Windowsconfigured as a daemon on the local machine where the self-hosted Integration Runtime is installed
At the corporate firewall level, you must configure the following domains and outbound ports:
domain name | exit doors | Description |
---|---|---|
public cloud:*.servicebus.windows.net Blue Government: *.servicebus.usgovcloudapi.net Porcelain: *.servicebus.chinacloudapi.cn | 443 | Required by the self-hosted integration runtime for interactive authoring. |
public cloud:{datafactory}.{region}.datafactory.azure.net o *.frontend.clouddatahub.net Blue Government: {datafactory}.{region}.datafactory.azure.us Porcelain: {datafactory}.{region}.datafactory.azure.cn | 443 | Required by the self-hosted integration runtime to connect to the Data Factory service. For the newly created public cloud data factory, find the FQDN of your self-hosted integration runtime key in the format {datafactory}.{region}.datafactory.azure.net. For legacy Data Factory and Azure Synapse Analytics, if you don't see the FQDN in your self-hosted integration key, use *.frontend.clouddatahub.net instead. |
descargar.microsoft.com | 443 | Required by the self-hosted integration runtime to download updates. If you have disabled auto-update, you can skip this domain setting. |
URL do Key Vault | 443 | Required by Azure Key Vault if you store credentials in Key Vault. |
At the Windows firewall or at the computer level, these outbound ports are usually enabled. Otherwise, you can configure domains and ports on a self-hosted integration runtime machine.
to use
As Azure Relay does not currently support a Service Tag, you must use the Service Tagazure cloudoInternetin the NSG rules to communicate with Azure Relay. You can use the Service Tag to communicate with Azure Data Factory and Synapse workspacesdata factory managementin establishing the NSG rule.
Depending on your sources and receivers, you may need to allow additional domains and outbound ports on your corporate firewall or Windows firewall.
domain name | exit doors | Description |
---|---|---|
*.core.windows.net | 443 | Used by the self-hosted integration runtime to connect to the Azure storage account when using the staged copy feature. |
*.base de dados.windows.net | 1433 | Required only if copying to or from Azure SQL Database or Azure Synapse Analytics; otherwise it is optional. Use the staged copy feature to copy data to SQL Database or Azure Synapse Analytics without opening port 1433. |
*.azuredatalakestore.net login.microsoftonline.com/<Mandatorio>/oauth2/token | 443 | Required only if copying to or from Azure Data Lake Store; otherwise it is optional. |
Some cloud databases, such as Azure SQL Database and Azure Data Lake, may require you to allow the IP addresses of self-hosted integration runtime machines in your firewall settings.
to use
It is not correct to install Integration Runtime and Power BI Gateway on the same machine because Integration Runtime mainly uses port number 443, which is also one of the main ports used by Power BI Gateway.
Get the Azure Relay URL
A mandatory domain and port, which must be whitelisted by the firewall, is required to communicate with Azure Relay. They are used by the self-hosted integration runtime for interactive creation, for example, to test the connection, browse folder lists and table lists, get schematics, and visualize data. If you don't want to give up.servicebus.windows.netand want more specific URLs, you can view all the FQDNs required by your self-hosted integration runtime in the service portal. Follow these steps:
Access the service portal and select your self-hosted integration runtime.
On the page, select EditIt.
To chooseView Service URLto get all FQDNs.
You can add these FQDNs to the firewall rule whitelist.
to use
For more information about the Azure Relay connection protocol, seeProtocol for Azure Relay Hybrid Connections.
Copy data from a source to a sink
Make sure that you correctly enable the firewall rules on the corporate firewall, on the self-hosted integration runtime machine's Windows firewall, and on the datastore itself. Enabling these rules allows the self-hosted Integration Runtime to successfully connect to the source and sink. Enable rules for each data store involved in the copy process.
For example, to copy from an on-premises datastore to an SQL database or Azure Synapse Analytics sink, follow these steps:
- Allow outbound TCP communication on port 1433 for Windows Firewall and Corporate Firewall.
- Configure the SQL database firewall settings to add the IP address of the self-hosted integration runtime machine to the list of allowed IP addresses.
to use
If your firewall does not allow outbound port 1433, the self-hosted integration runtime cannot access the SQL database directly. In that case, you can use astaged copyto SQL Database and Azure Synapse Analytics. In this scenario, you only need HTTPS (port 443) for data movement.
If the entire data source and sink, as well as the self-hosted integration runtime, are in an on-premises environment, the copied data will not be moved to the cloud and will remain on-premises only.
credential storage
There are two ways to store credentials when using the self-hosted Integration Runtime:
- Use Azure Key Vault. This is the recommended way to store your credentials in Azure. The self-hosted integration runtime can get credentials directly from Azure Key Vault, which can avoid some potential security issues or credential synchronization issues between self-hosted integration runtime nodes.
- Save the credentials locally. Credentials are passed to your self-hosted integration runtime computer and encrypted. If the self-hosted integration runtime is recovering from a failure, you can either restore the credentials from the previous backup or edit the linked service and send the credentials back to the self-hosted integration runtime. Otherwise, the pipeline will fail due to missing credentials when run from the self-hosted integration runtime.
to use
If you prefer to store the credentials locally, you must add the interactive authoring domain to the firewall whitelist and open the port. The self-hosted integration runtime also uses this channel to obtain credentials. For domains and ports required for interactive authoring, seePorts and Firewalls
Installation best practices
You can install the self-hosted Integration Runtime by downloading a managed identity setup package fromMicrosoft Download Center. see articleMove data between on-premises and the cloudfor step-by-step instructions.
- Configure a power plan on the host computer for the self-hosted Integration Runtime so that the computer does not go to sleep. When the host machine goes to sleep, the self-hosted integration runtime goes offline.
- Take a regular backup of the credentials associated with the self-hosted integration runtime.
- For information about automating self-hosted IR setup processes, seeConfigure an existing self-hosted IR via PowerShell.
important considerations
When installing a self-hosted Integration Runtime, consider the following
- Keep it close to your data source, but not necessarily on the same computer
- Do not install on the same computer as the Power BI gateway
- Windows servers only (FIPS compliant encryption servers can cause jobs to fail)
- Share across multiple data sources
- Sharing across multiple data factories
Next steps
For step-by-step instructions, seeTutorial: Copy local data to the cloud.