Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
The regional availability of large language models (LLMs) can provide a serious competitive advantage — the faster enterprises have access, the faster they can innovate. Those who have to wait can fall behind.
But AI development is moving so quickly that some organizations don’t have a choice but to bide their time until models are available in their tech stack’s location — often due to resource challenges, western-centric bias and multilingual barriers.
To overcome this critical obstacle, Snowflake today announced the general availability of cross-region inference. With a simple setting, developers can process requests on Cortex AI in a different region even if a model isn’t yet available in their source region. New LLMs can be integrated as soon as they are available.
Organizations can now privately and securely use LLMs in the U.S., EU and Asia Pacific and Japan (APJ) without incurring additional egress charges.
“Cross-region inference on Cortex AI allows you to seamlessly integrate with the LLM of your choice, regardless of regional availability,” Arun Agarwal, who leads AI product marketing initiatives at Snowflake, writes in a company blog post.
Crossing regions in one line of code
Cross-region must first be enabled to allow for data traversal — parameters are set to disabled by default — and developers need to specify regions for inference. Agarwal explains that if both regions operate on Amazon Web Services (AWS), data will privately cross that global network and remain securely within it due to automatic encryption at the physical layer.
If regions involved are on different cloud providers, meanwhile, traffic will cross the public internet via encrypted transport mutual transport layer security (MTLS). Agarwal noted that inputs, outputs and service-generated prompts are not stored or cached; inference processing only occurs in the cross-region.
To execute inference and generate responses within the secure Snowflake perimeter, users must first set an account-level parameter to configure where inference will process. Cortex AI then automatically selects a region for processing if a requested LLM is not available in the source region.
For instance, if a user sets a parameter to “AWS_US,” the inference can process in U.S. east or west regions; or, if a value is set to “AWS_EU,” Cortex can route to the central EU or Asia Pacific northeast. Agarwal emphasizes that currently, target regions can only be configured to be in AWS, so if cross-region is enabled in Azure or Google Cloud, requests will still process in AWS.
Agarwal points to a scenario where Snowflake Arctic is used to summarize a paragraph. While the source region is AWS U.S. east, the model availability matrix in Cortex identifies that Arctic is not available there. With cross-region inference, Cortex routes the request to AWS U.S. west 2. The response is then sent back to the source region.
“All of this can be done with one single line of code,” Agarwal writes.
Users are charged credits for use of the LLM as consumed in the source region (not the cross-region). Agarwal noted that round-trip latency between regions depends on infrastructure and network status, but Snowflake expects that latency to be “negligible” compared to LLM inference latency.
Source link