Getting layer security hardenization straight

OSI 7 Layers cloud security

From 3 to 7 with AWS Services

The OSI Reference Model is common ground for us tech geeks and tech enthusiasts alike, and it has become the authority in network proceedings and in cloud security divisions. This model divides these proceedings into different interconnected layers, each having its own class of functionalities, ranging from the transmission and reception of raw bit streams over a physical medium to delivering the final unpacked information to the end-user in the application.

In cloud security matters, the hardenization of layers 3 to 7 is paramount, and getting it right will prevent information from entering the network and possibly disrupting the architecture. However, there seems to be much disagreement about how to apply securitization best practices into these layers.

Luckily, AWS has a comprehensive array of services that works organically to readjust all the layers’ cloud security aspects. In this post, we will show some AWS services that will make securitization procedures in the OSI Model layers 3 to 7 easy to deal with.

Layers 3 and 4

AWS Network Firewall

AWS Network Firewall is a managed cloud security service whose primary purpose is to facilitate the process of enabling pervasive network protections in our AWS workloads and VPCs. This top-notch service from AWS offers high availability protections. It scales automatically with network traffic, which is a great advantage as there is no need to revise or install products in the underlying infrastructure.

Apart from the default rules defined by AWS Network Firewall, it is possible to add different customizable rules to the organization’s security needs. These rules can be imported from reliable providers like the OISF (The Open Information Security Foundation), which developed a free, open-source tool that deploys granular network protection across the entire AWS environment.

The name of these rules is Suricata, a set of default rules that configures the firewall to allow or deny specific data into our company’s data center. This ensures that all the data received will not harm or cause vulnerabilities in our cloud system. This hardened security helps block or prevent unauthorized domains of known bad IP addresses into our VPCs, for example.

cloud security

Commonly, an Internet gateway is assigned to a Route Table that connects with an application. If the company decides to implement AWS Network Firewall, some specific methods and strategies will need to be replanned within our networking standard stack, like creating a particular Route Table for the Internet gateway.

We can think of the server infrastructure as being a puzzle. That is, the user needs to make any necessary changes for all the pieces to fit perfectly. This modification or enlargement allows the Network Firewall to become the proxy between the connection from the Internet to our servers and the other way around.

Security Groups

cloud security

Security Groups are stateful, meaning each group has its own state. These groups’ configurations consist of two different kinds of rules: Inbound Rules y Outbound Rules. But before going further with this explanation, we want to comment on this topic.

With the Inbound Rules, the user defines the ports through which the traffic will enter. Every packet will enter and leave through the Inbound Rule (if so selected). The outbound rules mean that the sent or received information originates in the server and leaves the server through the outbound rule. When the data returns, it returns through the outbound rule.

Security groups are often misconfigured. How are yours doing?

Layers 5 and 6

AWS Session Manager

Securizing applications in private subnets are sometimes hard work. This is so because applications in private subnets require the creation of a Bastion in a public subnet. The Bastion allows applications to connect to the Internet because, by default, private subnets do not have internet access.

There are some methods to protect and securize the Bastion that take a lot of time to set up. Customers can configure the Bastion only to allow users to enter with a VPN. They can restrict Access Control List entrances to the Bastion, create restrictive security groups, and generate Route Tables specific to the apps in the private subnet, among others.

Luckily, AWS has come up with Session Manager. This service enables users to access all the instances with the SCO instead of having hardcored credentials. This allows the user to connect with the instance without such an instance having an internet connection. With this AWS Service, we get that additional abstraction layer that the server needs; that is, the user will enable or disable access through SCM.

AWS will take care of all the tunnelization and configuration of security and traffic within these private and public subnets. Users only need their AWS user and password and the public access key to connect to the different instances.

Load Balancer

With an Elastic Load Balancer, you can balance the traffic in an application. A Load Balancer can redirect the traffic into the different services or EC2 instances to relieve the routes from overloading in a multi-server architecture that receives quite a traffic load. Likewise, if there are several routes within the same or different servers, the Load Balancer can decide which will be each route’s destination. This means that users will drastically reduce loss by throttling as traffic gets constantly distributed.

Load Balancer can work in tandem with other AWS services to further secure the architecture against DDoS attacks. These cloud security services can identify, limit, and balance the traffic between the servers and the different resources. If you receive a DDoS attack or somebody is trying to access non-existent directories with non-operating gears, these services will block these requests or balance them.

cloud security

Lastly, suppose there is an application in a private database layer. In that case, a load balancer can be located in the public layer. Within this public layer, it redirects traffic from the Internet to the application. This allows the instance to exit the application through the load balancer, thus preventing the server’s location in a public layer while granting an internet connection in the private layer.

Network Access Control List (NaCLs)

AWS Network Access Control List is a set of rules that either allow or deny access in a network environment. They apply to the whole layer (whether public or private). Unlike Security Groups, which can only allow but not deny access, NACLs are much more explicit as they can deny access to the network layer to one or several instances.

The explicitness of NACLs applies both to inbound and outbound rules. This means that the NACL can allow or deny access through both rules. As regards inbound rules, for example, they can be set up for instances to get into the network layer from IP n to IP n or from IP range n to IP range n in connection to HTTP, HTTPS, or SSH protocols as applicable. The same applies to outbound rules. Basically, NACLs need explicit configuration to allow or deny access in and out; otherwise, it will not work correctly.

Let’s delve into NACLs and Security Groups at play with a practical example to study how they work together hierarchically. The NACL is set up to allow instances into my network layer through port 443 to a sider IP, but the Security Groups are not set up to allow instances from port 443 in; therefore, the connection gets lost. Hierarchically, instances first encounter with NACLs as they work at the subnet layer, and then they face Security Groups if NACLs allow them.

Layer 7

Web Application Firewall (WAF)

At the application layer, AWS WAF comes in handy to define the rules governing the traffic to our application. The AWS WAF can work in tandem with different AWS services like AWS ALB or AWS CloudFront.

In an application load balancer, the WAF will define rules groups, like cross-sight encrypting or SQL injection, before letting this traffic into the application. These rules can enter into three modes: count, block or allow. Rules in counting mode will verify matches, let the traffic in and create a log afterward. Rules in allow mode will allow all traffic if there is a match, and rules in block mode will block all requests if there is a match.

cloud security

Another AWS service that adds an additional nuance of protection to our application layer is AWS Shield. In its standard version, this service defends against the most common and frequently occurring network and transport layer DDoS attacks that target the customers’ website or applications. But in its advanced version, AWS Shield Advanced, the cloud security service offers protection against layer 7 DDoS attacks, including features like Tailored detection based on application traffic patterns, Health-based detection, Advanced attack mitigation, and Automatic application layer DDoS mitigation, among others.

Conclusion

Understanding how hardenizing our layers is paramount to keeping our cloud systems well-secured, and AWS can cater to the security needs from layers 3 to 7. This post aims to give developers some security best practices to get the most out of your cloud system’s potential.

However, it should be noted that this list is not exhaustive, and the deployment of these services and configurations requires technical expertise. DinoCloud is a Premier AWS Partner with a team of experts that can help you achieve safer cloud deployments. Get started by clicking here to talk to one of our representatives.


Redes sociales:

LinkedIn: https://www.linkedin.com/company/dinocloud
Twitter: https://twitter.com/dinocloud_
Instagram: @dinocloud_
Youtube: https://www.youtube.com/c/DinoCloudConsulting

11 best practices to get your production cluster working from the get-go

Containers have become the norm for the creation of cloud-native applications, and Kubernetes, commonly referred to as K8s, is undoubtedly the most well-liked container orchestration technology.

Popularity and usability are not the same things, though, as the Kubernetes system is a complicated one; it requires a steep learning curve to get started. While some of the following Kubernetes best practices and suggestions may not be appropriate for your environment, those that are can help you utilize Kubernetes more effectively and quickly.

This post will delve into 11 Kubernetes best practices to get the most out of your production cluster.

Always use the latest version

We’re kicking things off with a friendly reminder: keep your Kubernetes version updated. Apart from introducing new features and functionalities, new releases come with fixes and patches to relieve vulnerability and security issues in your production cluster, and we think this is one of the most salient advantages of keeping your K8s up-to-date.

However, the production team should study and test thoroughly all the new features before updating, as well as those features or functionalities that were depreciated to avoid losing compatibility with the applications running on the cluster. Updating the version without analyzing it and testing it in a secure environment could hinder production times.

Create a firewall

This best practice may not come as a surprise to you, as having a firewall in front of your Kubernetes cluster seems common ground, but there are a lot of developers that do not pay attention to this.

So here’s another friendly reminder: create a firewall for your API server. A firewall will ward off your K8s environment to prevent attackers from sending connection requests to your API server from the Internet. IP addresses should be whitelisted and open ports restricted by using port firewalling rules.

Use GitOps workflow

kubernetes firewall

A Git-based workflow is the go-to method for a successful Kubernetes deployment. This workflow sparks automation by using CI/CD pipelines, improving productivity by escalating application deployment efficiency and speed.

Bare in mind, however, that the git must be the single source for all automation which will unify the management of the whole production cluster. Another option is to choose a dedicated infrastructure delivery platform, like Argo CD, a declarative, GitOps continuous delivery tool for Kubernetes.

Are you stuck in Git-Ops?
We can help you with that

Audit your logs

Audit your logs regularly to identify vulnerabilities or threats in your cluster. Also, it is essential to maintain a centralized logging layer for your containers.

Besides, auditing your logs will tell you how many resources you are consuming per task in the control plane and will capture key event heartbeats. It’s crucial to keep an eye on the K8s control plane’s components to limit resource use. The control plane is the heart of K8s; it depends on these parts to maintain the system’s functionality and ensure proper K8s operations. The control plane comprises the Kubernetes API, kubelet, etcd, controller-manager, kube-proxy, and kube-dns.

Make use of namespaces

Kubernetes comes with three namespaces by default: default, kube-public, and kube-system. Namespaces are critical for structuring your Kubernetes cluster and keeping it secure from other teams operating on the same cluster. You need distinct namespaces for each team if your Kubernetes cluster is vast (hundreds of nodes) and many teams or apps are working on it. Sometimes, different environments are created and designated to each team for cost-optimization purposes.

You should, for instance, designate various namespaces for the development, testing, and production teams. By doing this, the developer who only has access to the development namespace will be unable to update anything in the production namespace accidentally. There is a likelihood of unintentional overwrites by teammates with the best of intentions if you do not perform this separation.

Resource requests and limits

Resource limits define the maximum resources used by a container, whereas resource requests define the minimum. Pods in a cluster can utilize more resources than necessary if there are no resource requests or restrictions.

The scheduler might be unable to arrange additional pods if the pod starts using more CPU or memory on the node and the node itself might even crash. It is customary to specify CPU in millicores for both requests and limitations. Developers use Megabytes or mebibytes to measure memory.

Use labels/tags

Multiple components, including services, pods, containers, networks, etc., make up a Kubernetes cluster. It is challenging to manage all these resources and track how they relate to one another in a cluster, so labels are helpful in this situation. Your cluster resources are organized using key-value pairs called labels in Kubernetes.

Let’s imagine, for illustration, that you are running two instances of the same kind of program. Despite having identical names, separate teams use each of the applications (e.g., development and testing). By creating a label that uses their team name to show ownership, you may assist your teams in differentiating between the comparable applications.

Role-Based Access Control

kubernetes role access

Your Kubernetes cluster is vulnerable to hacking, just like everything else. To get access, hackers frequently look for weaknesses in the system. So, maintaining the security of your Kubernetes cluster should be a top priority, and verifying that RBAC is at use in Kubernetes as a first step.

Each user in your cluster and each service account running in your cluster should have a role. Multiple permissions are contained in Role-Based Access Control roles that a user or service account may employ. Multiple users can have the same position, and each role can have various permissions.

Track network policies

Network policies serve to to limit traffic between objects in the K8s cluster. All containers have network communication capabilities by default, which poses a security concern if bad actors can access a container and use it to move between objects in the cluster.

Like security groups in cloud platforms limit access to resources, network policies can govern traffic at the IP and port levels. Typically, all traffic should be automatically denied, and rules should be implemented to permit necessary traffic.

Are your application security-sensitive areas being overwatched?

Use readiness and liveness probes

Readiness and liveness probes function like health exams. Before permitting routing the load to a specific pod, a readiness probe verifies the pod is active and operational. Requests withhold from your service if the pod isn’t available until the probe confirms the pod is up.

A liveness probe confirms the existence of the application: it pings the pod in an attempt to get a response before checking on its status. If nothing happens, the application isn’t active on the pod. If the check is unsuccessful, the liveness probe creates a new pod and runs the application on it.

Services meshes

You can add a dedicated infrastructure layer to your applications called service mesh. Without adding them to your code, it lets you transparently add features like observability, traffic management, and security. The phrase “service mesh” refers to the software you employ to carry out this pattern and the security or network domain that results from its application.

It can get more challenging to comprehend and manage distributed service deployment as it increases in size and complexity, such as in a Kubernetes-based system. Its requirements may include measurement, monitoring, load balancing, failure recovery, and discovery. Additionally, a service mesh frequently takes care of more complex operational needs like end-to-end authentication, rate restriction, access control, encryption, and canary deployments.

The ability to communicate between services is what enables distributed applications. As the number of services increases, routing communication within and between application clusters becomes more difficult.

These Kubernetes best practices are just a tiny sample of all those available to make Kubernetes a more straightforward and beneficial technology to use while developing applications. As we said in the introduction, Kubernetes requires a steep learning curve to get started.

Even with the ever-increasing number of tools and services to speed up the procedures involved, that can be overwhelming for development teams already swamped with the numerous duties required in modern application development. But if you start with these pointers, you’ll be well on your way to adopting Kubernetes to advance your complex application development initiatives.

Prevent and reduce vulnerabilities in your Kubernetes cluster

In DinoCloud, we have an excellent team of engineers and architects with vast experience in Kubernetes environments. Let’s find out how we can help you overcome the difficulties in the development of your cloud application.


Redes sociales:

LinkedIn: https://www.linkedin.com/company/dinocloud
Twitter: https://twitter.com/dinocloud_
Instagram: @dinocloud_
Youtube: https://www.youtube.com/c/DinoCloudConsulting

aws cloud security misconfigurations

Exploiting AWS cloud security misconfigurations is big business, an industry in its own right. Bad actors are snooping, copying, and selling data and ransoming companies 24/7. Organizations and governments are desperate to offer big rewards to expose and arrest these online opportunists. Mistakes, oversights, or ill-informed cloud service configuration choices are at the center of vulnerabilities that make the nightly news every week. To compound the problem, according to research, most environments are operated under a “set it and forget it” policy where reviews and audits are not used to ferret out the original mistakes.

aws cloud security misconfigurations

It is vital to employ experienced AWS cloud professionals to architect, engineer, implement, maintain, and audit your cloud environments. Small misconfigurations lead to mountainous liability.

Let Us Configure Your Cloud

Data Exposure Risks

As AWS’ customer list of enterprises that upload and distribute data across the globe grows exponentially, the cost-effective, productivity-enhancing services come with risks of misconfiguration vulnerabilities.  

Misconfigured Amazon S3 buckets are the most common security threat to AWS cloud environments. Through poorly configuring this cloud object storage inadvertently, public read access makes data breaches possible, and public write access exposes code for injection of malware or encryption of data to hold a company ransom.

aws cloud security misconfigurations

What is Amazon S3?

Amazon Simple Storage Service (S3) is an object storage service. Fifteen years after its introduction, it offers industry-leading scalability, data availability, security, and performance. All organizations in every industry use Amazon S3 to store and protect data for various use cases:  data lakes, websites, mobile apps, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.

Recent AWS Cloud Breaches Due to Amazon S3 Misconfigurations

October 2021, Twitch, Amazon.com, Inc.’s live streaming e-sports platform, blamed “an error” in a server configuration change for exposing sensitive information that enabled a hacker to leak data from its source code repository and creator payout information.

August 2021, SeniorAdvisor, a ratings/review website for senior care services in the US and Canada,  was found by ethical hackers at WizCase to have a bucket configuration that left over 1 million files (182 GB) of personal contact information from leads and reviews.

Moreover, ethical hackers at WizCase exposed more than a terabyte of data, including 1.6 million files with city residents’ sensitive personal data, building plans, city plans, and local property data. It is unclear whether the Amazon S3 bucket misconfigurations were due to actions by PeopleGIS or the municipalities.

In March 2021, Premier Diagnostics, a Utah-based COVID testing company, exposed patients’ data via improperly configured Amazon S3 buckets. This company disclosed Over 50,000 scan documents of customers’ personal information, including images of driver’s licenses, passports, and medical insurance cards.

How to Minimize Data Exposure Risks in Amazon S3

Data Encryption

Encrypt all data while in transit to and from and stored in S3. Use client-side encryption or SSL/TLS protocol. 

Enable Bucket Versioning 

Versioning adds a layer of data protection against user actions and application failures. With Amazon S3 bucket versioning enabled, AWS simultaneously stores all objects when receiving multiple write requests to the same object.

Enable MFA Delete

Further, strengthen security by enabling Multi-factor Authentication (MFA) Delete for a bucket’s versioning configuration. This MFA Delete setting requires a bucket owner to include the ‘x-amz-mfa’ request header. This is so when requesting to permanently delete an object version or change the bucket’s versioning state. Note that requests that have ‘x-amz-mfa’ are required to include HTTPS. Failure to meet all these requirements will cause the request to fail. 

Use Amazon S3 Object Lock

Prevent an object from being overwritten or deleted for a fixed period or indefinitely by using the S3 Object Lock service. It stores objects using a write-once-read-many (WORM) model. By using an Object Lock, you can. 

Utilize Multi-Region Application 

Multi-Region Application Architecture enables users to create fault-tolerant applications with failover to backup regions. Using S3 Cross-Region replication and Amazon DynamoDB Global Tables, the service asynchronously replicates application data across primary and secondary AWS regions. 

Lockdown Public Access to Amazon S3

Use Block Public Access settings to override Amazon S3 permissions to prevent accidental or intentional unauthorized exposure. This will ensure your Amazon S3 is secure. These are helpful settings for administrators to centralize account controls for maximum protection.

All new buckets, objects, and access points are not set to public access by default. But Amazon S3 users can modify policies and permissions that allow public access, potentially exposing sensitive data. Unless a dataset specifically requires read or write access on the internet via a URL to your Amazon S3 bucket, it should not be allowed. 

Identify Bucket Policies that Allow Wildcard IDs

Identify Amazon S3 bucket policies that allow a wildcard identity, such as Principal “*” or a wildcard action “*.” This effectively enables users to perform any action in Amazon S3. Also, audit for Amazon S3 bucket access control lists (ACLs) that grant read, write, or full access to “Everyone” or “Any authenticated AWS user.” A bucket policy must grant access only to fixed values or non-wildcarded values to ensure data is non-public.

See AWS detailed instructions on making policies non-public.

Audit IAM Systems

AWS IAM systems are your first line of defense in securing access to sensitive data and your applications. Deploying default AWS IAM settings will jeopardize your organization. Even if you avoid this common pitfall and set up AWS IAM policies that effectively secure a resource, that protection can become outdated. For example, if a user changes departments or roles within a department, access rights should change to match the new role’s need to access data and applications. Regularly audit AWS IAM rules to protect your environment better.

Use Tools to Inspect Implementations 

Use tools to reduce human error. More tools will arise over time, but currently, we recommend: 

  • AWS Trusted Advisor, an online tool that offers real-time guidance and support when provisioning resources on AWS, inspects your Amazon S3 implementations to strengthen your efforts to prevent public access.  
  • Employ real-time monitoring through the s3-bucket-public-read-prohibited and  s3-bucket-public-write-prohibited managed AWS Config Rules. 
  • Hire DinoCloud to complete a Well-Architected Framework review and remediate your Amazon S3 implementation.

Enforce Least Privilege Access 

Add layer protection against unauthorized access to Amazon S3  by enforcing least privilege access. This means granting only permissions to identities (users, roles, services) that are required to perform its tasks. This principle prevents malware, reduces the potential for cyberattacks, aids data classifications, helps demonstrate compliance, and promotes user productivity.

aws cloud security misconfigurations

Focus on these best practices:

  • Separation of duties – avoid conflicts of interest between people and applications, ensuring that responsibilities and the privileges granted to accomplish them are not to leave the organization open to fraud, theft, circumvention of security controls, or other risks. Failure to separate functionality can result in toxic combinations where privileges can be abused. For example, a user with permission to create an invoice may also have been provided the privilege of paying the invoice. 
  • Inactive identities – review access privileges for lack of login activity, then ideally remove them, but at a minimum, closely monitor them as bad actors can gain access without the owners’ knowledge.
  • Privilege escalation – this occurs when vulnerabilities, often identity and access management  (IAM) misconfiguration, are exploited either horizontally by one user using its privileges to access another user’s account or, even more, prone to damage, vertically where a user accesses accounts with higher level privileges such as an account administrator.  

AWS tools that aid you in implementing least privilege access:

Is Your Infrastructure
Well-Architected™?

Software Supply Chain Risks

High-profile incidents implicate software supply chains, such as the 2020 US SolarWinds and 2021 Log4 breaches, are rising. Securing the software supply chain is becoming a more-often addressed topic among IT teams, but regrettably, many are not considering the public cloud as part of that chain. It is understandable how this oversight happens as the public cloud is more “infrastructure” than “software.” But, cloud security risks can rain (pun intended) on your software supply chain, causing a storm just as volatile as software applications.

Software supply chain infiltration is lucrative and can be scaled quickly and efficiently. By infiltrating a soft target, then exploiting poor configurations, bad actors can deploy malware in multiple organizations’ environments for later launch. With access to the environments, hackers can evolve, most often undetected. 

Public cloud security breaches happen, often exposing data stored in the cloud but not giving hackers direct access to entire IT environments. As proven in the SolarWind breach, it can happen. Consider these reasons to include consideration the public cloud as a part of your software supply chain:

  • Clouds are more than just infrastructure: Though Infrastructure-as-a-service (IaaS) may be the primary offering, most cloud vendors also deliver software-as-a-service (SaaS) applications.
  • Infrastructure security breaches are painful: Even if your use of the cloud is infrastructure services only, vulnerabilities in the cloud platform can expose your data or applications to malicious assaults.
  • Public clouds get hacked: As you have no doubt read in IT publications and sometimes mainstream news, it is possible for hackers to attack your public cloud to be attacked.

You are opening up your software supply chain to vulnerabilities without tracking risks and breaches that impact the cloud environments you use. 

Securing the Cloud in Your Software Supply Chain

Reduce risks of using the public cloud as part of a software supply chain:

Know your cloud environment

Unless you know what you have, you cannot monitor it, much less secure it. It is a challenging task that gets more challenging the more prominent the organization, but it is vital. To help organize this effort, enforce tagging rules for your cloud resources to better track workloads and periodically conduct audits to map cloud workloads.

Track your cloud platform(s) security incidents

Once you understand your cloud environment, you will know a list of cloud platforms you must stay well-informed about. For each security incident you become aware of, learn as much as possible about them to know if any of your workloads are impacted. Read IT publications and general news, and follow the security thread of the blog of your cloud provider. Here is the AWS Security Blog

Minimize data exposure

Store fewer data in the public cloud to reduce the impact on your organization and customers if there is a breach. Consider a hybrid cloud architecture.

Spread workloads across multiple cloud accounts

Splitting workloads across different accounts also reduces the impact of any breaches that may occur. 

Do not blindly trust third-party containers

If you deploy cloud applications using containers, you know that scanning for vulnerabilities before deployment is standard and best practice. However, many organizations’ environments blindly trust containers from third parties. This situation often happens when you deploy sidecar containers to connect resources like third-party logging services. Generally, a reliable company or open-source project should not cause you to relax your practices of scanning containers. Scan all containers before deployment. 

Don’t Be a Victim of AWS Cloud Security Misconfigurations

DinoCloud’s architects and engineers are experienced AWS Partners. Let’s work together to secure your infrastructure while taking advantage of all the benefits of cloud.

Redes sociales:

LinkedIn: https://www.linkedin.com/company/dinocloud
Twitter: https://twitter.com/dinocloud_
Instagram: @dinocloud_
Youtube: https://www.youtube.com/c/DinoCloudConsulting

Amplify_portada

What is AWS Amplify?

AWS Amplify is a set of purpose-built tools and features that simplifies UI and backend creation of full-stack mobile and web applications. Its code libraries, ready-to-use components, and built-in CLI (command line interface) boost performance and quality in your app development lifecycle. AWS Amplify supports a wide variety of languages for web and mobile application development.

The AWS Amplify development framework provides use cases for faster, easier development and deployment of high-impact mobile and web apps. From authentication to data to AIML (artificial intelligence markup language), Amplify’s toolset streamlines the process. Your mobile and web apps perform at their peak powered by AWS Services.

Cloud is powerful, but necessarily complex. AWS Amplify shields your development and deployment cycles from that complexity. 

Here are reasons we think you should use AWS Amplify for your Agile Mobile and Web app development. 

Free then Pay-As-You-Go Development Platform

A much-utilized feature of paid AWS Services is the usage payment model. AWS Amplify offers this flexible and cost-efficient model enabling you to save money by paying only for the services you use. But, even better than that, Amplify offers impressive service tiers for free. You only pay once your trial period of 12 month is over or until you reach a higher threshold of technical requirements. 

No Cloud Expertise Required

Under the whole AWS Amplify ecosystem, you can take advantage of its amazing AWS Amplify Studio tool. This feature creates clean REACT code from visually executed changes made in the console. This allows you to focus less on detailed coding and more on awesome user experience and business functionality. You can let the tool worry about efficient code, performance, and scalability. Amplify is a low-code solution with guided workflows to set up best-in-class backends handling complexity without slowing development. 

Speeds Prototyping

AWS Amplify puts all the tools and processes at your developers’ fingertips that enable them and the product development team to to experiment with new functionality, adding and sunsetting features quickly as they get customer or focus group feedback on what the market wants and what it doesn’t. The ability to exponentially accelerate development and deployment cycles brings an incredible competitive advantage of agility.

No Loss of Architectural Control 

Amplify’s use-case centric approach to building cloud backends is instantly attractive to new developers and those new to cloud. This can warn more seasoned developers and architects who are likely to raise concerns about full control on the code. But with Amplify, there is no need for concern because the platform offers “escape hatches” to enable a team ot fine-tune lower-level APIs, if ever required. These short-cuts enable customization of specific API calls sent to backend services.

Integrate with other AWS Services

AWS Amplify offers the User Interface (UI) elements for cloud-connected workflows, CLI toolchain, and code libraries. To integrate with other AWS services, these libraries can be employed mutually or independently without needing to adjust the UI of the existing app. The combination of an existing frontend app with Amplify can be done by adding a few code lines.

There are few approaches when joining an actual frontend application with AWS Amplify. For example,  doing a complete backend rebuild that can be done by using the Amplify toolchain to rebuild resources.

Build your cloud native application with DinoCloud

AWS Amplify SDK Integrates with the most used App Development languages

Even though the SDK is fully available for Node developers, AWS Amplify is easy to integrate with the most popular Web & Mobile applicaiton development framework, including native languages (Android & iOS), Flutter and React Native. 

Automate Cloud Backends

AWS Amplify improves app performance by offering built-in support for backend management. Amplify CLI enables you to auto-configure cloud backends by connecting your backend via libraries. Easy-to-use CLI commands enable you to to add more cloud services and features and efficiently make changes to your AWS-managed backends. The CLI processes and workflows are seamless, accelerating development. .

Common Workflows Faster and Standardized

Amplify UI offers dozens of pre-built components of common workflows such as login and logout. Adding them to your app is as simple as dragging and dropping in the component. Save time, get standardization across applications, and put your development time to better use on critical functions and innovation.   

Useful Web-Based Analytics

AWS Amplify’s  web-based analytics dashboard offer up-to-date metrics on user sessions,  attributes, and in-app metrics. Developers, designers, and project managers benefit from the insight this analytics dashboard provides. 

AWS Amplify brings the security reliability, and availability to full-stack mobile and web apps that AWS Services is renowned for.

Deploy secure and efficient apps quickly without having to manage the underlying infrastructure.

Learn how DinoCloud has helped its clients build cloud native web and mobile  applications.
Connect with our team to talk about your next  project!

Redes sociales:

LinkedIn: https://www.linkedin.com/company/dinocloud
Twitter: https://twitter.com/dinocloud_
Instagram: @dinocloud_
Youtube: https://www.youtube.com/c/DinoCloudConsulting

Written by Guadalupe Vocos & Pedro Bratti | Cloud & DevOps Engineer @ DinoCloud

You can deploy an Amazon S3 bucket to work as a website with a CloudFront Distribution for Content Delivery and Route 53 for cloud Domain Name System (DNS)

Why choose a static website?

Hosting static websites is becoming more and more popular, but what does it mean to be static? It means that your site consists of a set of “pre-built” files (HTML, js, and CSS files) that are directly served on request. This plus the resources that AWS offers allows us to have a serverless, flexible, scalable, highly performing, secure and low-cost infrastructure.

Before you begin:

As you follow the steps in this example, you will work with the following services:

  • CloudFront: Distribution and Origin Access Identity.
  • Route 53: Hosted Zone and Records.
  • S3: Bucket.

You will need to have these prerequisites before starting the steps:

  • Route 53: Domain Name already registered.
  • Certificate Manager: Certificate requested (Optional in case you want to secure communication through the HTTPS protocol).

Step 1: S3 Bucket with Static Website Hosting

  • Sign in to the AWS Management Console and open the Amazon S3 console at AWS S3.
  • Choose Create Bucket, enter the Bucket Name (for example, medium.dinocloudconsulting.com) and on region select us-east-1 and Create.
  • Now you can upload your index.html to the already created bucket.

Step 2: Route 53 Create Hosted Zone

  • Sign in to the AWS Management Console and open the Amazon Route 53 console at AWS Route 53.
  • Choose Create Hosted Zone, enter the Name, select the Type Public Hosted Zone and Create.

Step 3.1: CloudFront Create and Configure Distribution and OAI.

  • Sign in to the AWS Management Console and open the Amazon CloudFront console at AWS CloudFront.
  • Choose Create Distribution, in the Origin Settings section, for Origin Domain Name, enter the Amazon S3 website endpoint for your bucket (for example, medium.dinocloudconsulting.com.s3-website-us-east-1.amazonaws.com)
  • Under Bucket Access select Yes, use an OAI, Create new OAI and select Yes, update the bucket policy.
  • (Optional) For SSL Certificate, choose Custom SSL Certificate (example.com), and choose the custom certificate that covers the domain
  • Set Alternate Domain Names (CNAMEs) to the root domain. (for example, example.com).
  • In Default Root Object, enter the name of your index document, for example, index.html.
  • Let the rest properties as default and Create.

Step 3.3: CloudFront Configure properties Error page.

  • Select the Distribution, already created, and go to the properties tab called Error page.
  • Set the followings properties:
    • HTTP error code: 403: Forbidden.
    • Customize error response: Yes.
    • Response page path: /index.html
    • HTTP response code: 200: OK

Step 4: Route 53 create a Record.

  • Sign in to the AWS Management Console and open the Amazon Route 53 console at AWS Route 53.
  • On the Hosted Zone which you already have created, select Create Record.
  • Select Record Type: A and under Value field, check Alias and select the CloudFront Distribution domain name.
  • Wait a couple of minutes for the DNS to propagate and search the site on your browser.

All ready! You now have your static website up and running.

At DinoCloud, we take care of turning a company’s current infrastructure into a modern, scalable, high-performance, and low-cost infrastructure capable of meeting your business objectives. If you want more information, optimize how your company organizes and analyzes data, and reduce costs, you can contact us here.

Guadalupe Vocos

Cloud & DevOps Engineer
@DinoCloud

Pedro Bratti

Cloud & DevOps Engineer
@DinoCloud


Social Media:

LinkedIn: https://www.linkedin.com/company/dinocloud
Twitter: https://twitter.com/dinocloud_
Instagram: @dinocloud_
Youtube: https://www.youtube.com/c/DinoCloudConsulting

AWS AppSync + GraphQL

Written by Nicolás Tosolini | Associate Software Engineer @ DinoCloud

What is AppSync, and what is it used for?

AppSync is an AWS service that is responsible for simplifying application development. For example, if you use a front-end development, you have to deploy an API as quickly as possible. In the shortest possible time, AppSync gives you the possibility to deploy the API quickly, with a couple of clicks, and Amazon takes care of the maintenance. It is very efficient for front-end developments since it gives you the possibility of having flexible and secure access to the data with one or more origins. AppSync allows you to take all the data, which comes from different sources, gather it, modify it, and bring only the relevant information, which ends up being an essential feature since it offers more speed and allows the performance to be much more performant.

AWS AppSync
AWS AppSync. Source: AWS

Another quality that AppSync provides is the management and synchronization of application data in real-time. For example, using subscriptions offers the possibility to the server that it is constantly listening to if there are databases or not, which is functional to create chats since it is constantly sending data in real-time. You have precise access to that data.

On the other hand, a feature that AppSync has is that it allows you to access and modify data offline, for example, if you have a mobile application. You run out of data, or you do not have Wi-Fi, it would be expected for you not to continue using it. However, from AppSync, It can continue to work. Once the internet connection is back, it will take care of merging the data from when you were offline, resolving conflicts, and syncing everything to the database. 

What is AWS AppSync?
What is AWS AppSync? Source: AWS

Another clear example is when from enterprise apps, from web apps, mobile apps, or I oT devices, AppSync takes all your information and processes it with GraphQL Scheme or resolvers, being able to see that the source of information can come from any site, be it or not on AWS, which turns out to be valuable if you need information that is in four different databases, as with AppSync you can grab everything and put it in one call.   

AppSync uses GraphQL as its query language, allowing the connection of different information sources, whether they are inside or outside of AWS. It has a high degree of security since it can use Amazon Cognito, IAM or the different possibilities of Amazon authentication, subscriptions to do real-time apps, and serverless caching.   

GraphQL and its functions

GraphQL is simplified data access and queries, which means that the client only consults the data it needs and the format it wants. Searching, filtering and querying data would be three dominant aspects in this language. It provides all the information, filters it, and offers it to you, thus increasing the system’s speed enormously, especially if it is a mobile application and it is not connected to Wi-Fi.

Another feature of GraphQL are subscriptions, which mean updates and access in real-time, which gives you the possibility to run applications instantly, for example, a chat that when you send information, the other user has to see it instantly, giving you the possibility of synchronizing many data at the correct time. 

AppSync is also relevant in caching since it gives you the ability to cache endpoints and resolvers, increasing the response speed.   

What are the benefits of using AppSync?

  • It is Effortless since you can get it up quickly, and you do not have to do server maintenance. If the application grows in size, it allows you to scale very quickly and have the entire AWS infrastructure.  
  • It has the advantage of providing you access offline and in real-time.
  • Moreover, finally, unified access, which means you have resolvers, lambadas, and all services and data in one place. 

Examples of possible applications to create

In a travel application where the user may or may not be connected to the Internet, especially if he is in another country, AppSync allows you to continue using the application since it will not give any error and then when connecting, it will synchronize all the information.

Unified access to data
Unified access to data. Source: AWS https://aws.amazon.com/es/appsync/ 

If it is a social media app, it will use lambda issues and access to different sources of information. Today Facebook has too much information, and this will allow you to access all those information points.  

Real-time collaboration
Real-time collaboration. Source: AWS https://aws.amazon.com/es/appsync/ 

Also, in the case of chat apps, everything happens thanks to the so-called real-time subscriptions, where the user has to have an answer instantly. Thanks to the subscriptions, the request in these services is made much faster and more agile. It is also connected to amazon Cognito, which brings a further step in security.  

Real-time chat application
Real-time chat application. Source: AWS https://aws.amazon.com/es/appsync/ 

What is the difference between GraphQL and Rest API?

GraphQL has the characteristic of having a single endpoint, allowing you to do everything from there. You have the option of giving it a query, mutation, or subscription, and the server will take care of returning that. It also gives you the possibility to bring only the information you want. For example, if you make a query and tell it that you only need one user and give it its ID, but in order that you only need three specific data, it will show you what you asked for and nothing else. Let us also think that having a single endpoint, you only have to make one call, and it brings you everything you need. Sometimes you may have the information stored in different places, so it is necessary to make several calls with Rest API to different endpoints. This brings you information that you do not need, which does not happen with GraphQL, since with its configuration, you ensure that it only shows you the relevant information you want and in the way you want. It is very optimal, especially when handling large volumes of information.       

GraphQL:

  • A query language for APIs.
  • It provides you with a complete and understandable description of your API data.
  • It offers you the possibility of obtaining only the data you need in a single request.

Some definitions:

  • Schemas: facilitates how identities are decided, how they are related to each other, which ones are available to each client. 
  • Query: it performs the queries to the single entry point (endpoint). 
  • Mutation: insert, delete and edit elements. 
  • Subscriptions: This allows real-time connection with the server to be immediately informed about important events. 

GraphQL Query Language

  • You want to read data
  • Mutations change data
  • Subscriptions subscribe to real-time data

Cognito: What is it, and what is it for?

  • Amazon Cognito provides authentication, authorization, and user management for your mobile and web applications. 
  • Users can log in directly with a username and password or through a third party such as Facebook, Amazon, Google or Apple (federation)
  • User groups and entity groups

At DinoCloud, we take care of turning a company’s current infrastructure into a modern, scalable, high-performance, and low-cost infrastructure capable of meeting your business objectives. If you want more information, optimize how your company organizes and analyzes data, and reduce costs, you can contact us here.

Nicolás Tosolini

Associate Software Engineer
@DinoCloud


Social Media:

LinkedIn: https://www.linkedin.com/company/dinocloud
Twitter: https://twitter.com/dinocloud_
Instagram: @dinocloud_
Youtube: https://www.youtube.com/c/DinoCloudConsulting

Data Lake at AWS

Written by Francisco Semino | Lead Solutions Architect @ DinoCloud

What is a Data Lake?

A company has data distributed in different silos (On-Premise databases), making it difficult to obtain information, gather it, and analyze it to make business decisions. Data Lake provides the ability to centralize all that data in one place. This will allow for processing all the data in the Data Lake and then generating statistics and analysis prior to a business decision. You can create charts, dashboards, and visualizations that show us how the company is, the products, and what the customer wants, among many other options, in addition to the ability to apply Machine Learning to predict this information and make decisions based on it.

A Data Lake is a repository where you can enter structured data (such as from databases) and unstructured (from Twitter, logs, etc.) You can also add images, videos (in real-time or recorded). One of the properties of a Data Lake is that it can be scalable up to Exabyte, a considerable amount of information. It does not imply that it is necessary to have many data to have a Data Lake; it does not have minimums or maximums.

It serves both small and large companies. It is because of its low-cost quality: you pay only for what you use. Being a cloud service, it has the advantage that there is no need to pay for storage “just in case”, but that you pay as you go, according to use. As much as if the Data Lake grows 5GB per month or 5TB per month, it will be paid only for that use.

A little history

What is known as Data Warehouse is the traditional Business Intelligence system of the company, one of its properties is that they only allow structured data. It involves much investment because we would have to pay for capacity (since the Data Warehouse has its processing). That is, this was only used in large companies due to the large amounts of investment required.

The Data Warehouse, due to its high costs and that its clusters are for processing as well as much less capacity than a Data Lake could not be scaled to Exabyte.

Although the most significant difference is that in Data Warehouse, the user defines the schema before loading data, that is, you must know and define what is going to be sent before loading it and then be analyzed by another of the tools of Business Intelligence that will show dashboards, visualizations, etc.

It does not mean that the Data Lake will supplant the Data Warehouse, but rather that it comes to complement it in cases where the company or architecture needs it or already owns it and does not want to get rid of it.

Data Warehouse process for further analysis
Data Warehouse process for further analysis.

So then, there are three possible architectures:

  1. That the company already has a Data Warehouse and wants to make a Data Lake. Then it can be done in a complementary way, creating a Data Lake separately and all the data from the Data Warehouse, sending it to the Data Lake and using its tools for Big Data processing, Machine Learning, and other issues; otherwise, it could not apply.
  2. The company does not have a Data Warehouse, one is needed, and a Data Lake because the Business Intelligence tool is to be used. The data engineers only support connections to the Data Warehouse where the data is structured. So what is recommended is to raise the Data Lake and create a separate Data Warehouse where all the data ingestion is done through the first one, in order to be then able to send the information directly to the Data Warehouse already transformed, so that the Business Intelligence tool consume it directly from there. In turn, all the data can be used in Big Data processing and all the tools that Data Lake allows us to use.
  3. Finally, and easier: that only one Data Lake is required. A Data Warehouse would not be needed since the Business Intelligence tool directly supports connections to the Data Lake. You could just lift the Data Lake and do all the Business Intelligence and Big Data processing directly from there.

Data Lake Properties

The most important property is that it does not matter where the information is located in an easy, secure way (it travels encrypted) and low cost. Everything can be migrated to a Data Lake: from Premise, from the cloud, from AWS, etc.

In addition to that, other data movements are obtained, which is if the application works real-time, that is, if it is required to send logs of our application, of Twitter tweets to see what the customer thinks of a product and service, it can be done in real-time and thanks to a lot of AWS services.

What is a Data Lake?
What is a Data Lake? Source: AWS

Another possibility is that a company has streaming videos in real-time and wants the application to continue to function normally, streaming videos in real-time and storing them in a Data Lake to be analyzed in real-time.

Once the data is ingested, the important part begins: analyze it, take advantage of the Data Lake, make business decisions that affect the company, improve it, improve its product, etc. Then there are two main branches: Analytics on the data, that is, show them on the dashboard, modify them, show visualizations, extract the information.

The second branch: Machine Learning, to be able to predict a little information. There are AWS services that allow analyzing Machine Learning, especially to companies that have experts in this subject, and services that allow small or medium-sized companies not to hire an expert in Machine Learning. For example, AWS Comprehend allows you to understand a bit of natural human language and transform that into ideas: understand what specific tweets are saying, know if they are evaluating it positively, negatively, or neutrally, etc. There are services like Recognition to recognize faces or objects in, for example, a live stream. This is a great advantage today because it allows small and medium-sized companies to have a Data Lake and exploit it without significant investment.

We are often asked in DinoCloud: “how long will my DL be up and running?”. The answer would be no more than two weeks, using what is recommended with essential functions initially, exploiting the data a little, seeing what the company needs, and making dashboards, visualizations, and Machine Learning.

Another common query is: “Would the development of a Data Lake affect my Application / Service that is running in the cloud?”. The answer is simply no. They are entirely complementary questions, in parallel. An application can continue to be developed by performing a Data Lake in parallel without disturbing or the performance being low at those moments in the application. It is because requests are not made directly to the database that the application is using. However, they apply Amazon services that allow extracting all that information from a database-type backup, doing it with the Read Replica, for example, without affecting the application and at a low cost.

AWS SERVICES

S3

Where do I keep the data, where do I store it, what would my Data Lake be? The answer is Simple Storage Services (S3). It is a storage of objects in Amazon. It is virtually unlimited, meaning that you can load as many exabytes as you need. It has an availability of 99.99%, which allows us to know that all our data will remain safe there, and any disaster or inconvenience that may occur, the data remains backed up. Being Amazon’s first cloud service, it is pretty polished and has much power, a lot to give, and all Amazon services are integrated with S3. This is the most important “why” of choosing S3 as a data storage for a Data Lake. It is also self-scalable, and it only charges for what it is used; it does not pay more.

Another of its main characteristics is security: you can block the permissions to other users, the only ones who can access this data are Amazon services, and you must pass through them to be able to see the data, in addition to being able to encrypt the data. Information through KMS (Key encryption service). You can also control the properties of the object at the object level itself, being able to make it public, for example, a single file within an entire bucket without having to make the entire bucket public.

S3 Specific properties
S3 specific properties.

One of the essential properties of S3 is the number of services that allow you to enter the data as needed. That is to say, it allows to unify of all the dispersed data (in a cloud, on-premise, etc.) in a Data Lake.

In terms of costs, S3 only charges for what is used and no more. These costs are tied to how frequently the user accesses the data that is in S3. S3 Standard has an estimated price of $ 0.0210 per GB.

S3 Standard IA (Infrequently Accessed Data) is next to S3 Standard. For less frequently accessed data, its price is reduced by almost 40%, and it has the same properties as the S3 standard. It is found in 3 availability zones, and it is available all the time; it has milliseconds of access. However, Amazon charges a small percentage of commission per Giga that is extracted, so each time you want to access the data, it will charge a small commission per object that is being requested.

By way of mention, there is also the S3 One Zone IA, which is the same as the S3 Frequently Access with the difference that it is found in an availability zone, with high availability and is generally used for backups. There are also S3 Glacier services, where access to data takes minutes or hours, and S3 Glacier Deep Archive, where there is a delay of 12 to 48 hours to access. These are used for data accessed once or twice a year, and the cost is extremely cheap.

How is the data ingested in a Data Lake? Here are some Amazon services that can be used to enter data:

  • AWS Direct Connect: allows you to segment and securely send all the data that does not pass through the internet. It is recommended for large amounts of data.
  • Amazon Kinesis: for streaming data and video
  • Amazon Storage Gateway: virtual connection between Amazon and an On-Premise. Allows file transfers safely and with all the properties.
  • Amazon Snowball: commonly used for physical migrations. Scalable up to Terabyte.
  • AWS Transfer for SFTP: raises SFTP servers and can be used through a VPN.

Kinesis

It is a real-time service from Amazon. It is divided into four sub-services:

  • Kinesis Video Stream that streams live videos allows that while the stream pipeline is being maintained, the data can be ingested to S3 in real-time or doing analytics on this video.
  • Amazon Kinesis Data Firehose allows data ingestion in ‘near real time’ to S3, Redshift, etc. If an application is sending events or logs all the time, it allows to ingest the data continuously and in ‘near real time’ to S3, ElasticSearch or Redshift.
  • Amazon Kinesis Data Stream that allows real-time data streaming but is usually used more to send data to applications, directly to an EC2 to be processed, and is responsible for sending it directly to Amazon Kinesis Data Analytics
  • Amazon Kinesis Data Analytics, real-time analytics that allows you to query the data that is passing live.
4 Kinesis sub-services
4 Kinesis sub-services. Source: AWS.

An essential property of Kinesis is that it is Serverless; you pay only for what you use.

AWS Glue

How to consume data from a Data Lake? This answer will begin by talking about AWS Glue. It is an Amazon service with two main parts, Data Catalog, where all the data is cataloged, and all the metadata is obtained and stored there. It allows a Data Lake to be kept organized so that other services can later consume it. It is crucial to have a data catalog. In turn, Amazon Glue has a service called Crawler, which allows the metadata of all the data to be extracted automatically and serverless. A Crawler is created, all metadata is extracted, and you are charged for the minutes it took the Crawler to extract that data. The data store can be S3 or any other storage. This catalog is saved in the Data Catalog part of Amazon Glue, in the form of a database, which shows a table with all the necessary information registered. The formats supported by crawlers are CSV, AVRO, ION, GrokLog, JSON, XML, PARQUET, GLUE PARQUET.

Queries in an Amazon S3 Data Lake
Queries in an Amazon S3 Data Lake

The second part is ETL, significant in the world of Data Lake and Big data, which is the part where all the data is extracted from the Data source, transformed employing a script running in an engine, and then loaded transformed to a target. This does not mean that the Data Source and the Data Target are different, but they can be the same.

Allowed Data Source and Data Target are Amazon S3, RDS, Redshift, and JDBC connections.

AWS Glue Jobs is a service that allows you to run a script on a serverless server. You can add a trigger in this; every time there is a file in S3, a trigger is automatically performed. However, the data must be cataloged to use Job since tables can only be created after being cataloged. For example, if you go from an S3 to a Redshift, the metadata must be present to create Redshift tables. Otherwise, it must be done manually. Then the Job procedure is as follows

  • extract the data,
  • perform a trigger in any way (on-demand or by a specific trigger),
  • extract the data from the source,
  • run a script that transforms the data, and
  • return them to carry.

It is essential to know; it is not necessary to know how to program in Python to run the script because Amazon offers the possibility of specifying the transformations that you want to do and writes the script automatically. If a modification is required, the script is available for modification. It is one of the main advantages of Amazon Glue Jobs.

AWS Athena

Another way to consume data from a data lake is AWS Athena. It is an Amazon service that allows me to query the data with SQL queries directly to S3. It is a serverless service. The queries have a performance to process the data at high speed and with fast configuration. Just go to the Amazon Athena console, indicate what data to analyze, and start writing. However, it is necessary to have the data cataloged, or it can be done by hand. You only pay for scanned data. If 1Gb is explored in a query, it will be charged only for 1Gb.

Amazon Athena allows from anywhere, for example, a business intelligence tool that needs to consume data from S3, make the connection, and perform the S3 query. So the Business Intelligence tool where all the dashboards will be displayed has a connection and processing capacity of bringing the data without the need to move all of these to a Data Warehouse.

Amazon RedShift logo

AWS Elastic Map Reduce

Finally, we will talk about Amazon EMR (Elastic Map Reduce). It is Amazon’s service par excellence in Big Data. It allows to deploy all the applications for all the Open Source frameworks, like Apache Spark, Hadoop, Presto, Hive, and others; it allows you to configure everything in cluster mode. It is self-scalable with high availability. It is vital because there are situations in which a large amount of data needs to be processed at a particular time, so you only charge for that time used, and you save much money. It is a Multi-Availability Zone, and it has data redundancy, and in any situation that happens, everything will remain up and available to the user. It is easy to administer and configure since it does so automatically by going to the console and raising the desired frameworks, indicating the number of nodes required, what types of nodes, and others. Amazon EMR is tightly integrated with Data Lake and all of the services listed above.

After processing all the data and ingesting it, now comes the part that business people are most interested in. The Business Intelligence service is called Amazon QuickSight. It is the first Business Intelligence service that pays per session. In other words, you will only pay each time you enter the QuickSight console, not by users, not by licenses, only by session. There are two types of sessions as in all Business Intelligence: the creator, the user who exploits the data, and the person who views the data to make decisions.

At DinoCloud, we take care of turning a company’s current infrastructure into a modern, scalable, high-performance, and low-cost infrastructure capable of meeting your business objectives. If you want more information, optimize how your company organizes and analyzes data, and reduce costs, you can contact us here.

Francisco Semino

Francisco Semino

Lead Solutions Architect
@DinoCloud


Social Media:

LinkedIn: https://www.linkedin.com/company/dinocloud
Twitter: https://twitter.com/dinocloud_
Instagram: @dinocloud_
Youtube: https://www.youtube.com/c/DinoCloudConsulting

Analyzing Data

Among other things, it will be possible to know what is needed to improve from previous events.

Written by William Díaz Tafur

Data analysis is vital for companies because, from this point on, it will give the answers that the business needs to be able to innovate in any area.

Furthermore, it is that the determinations taken from the data give a very high rate of effectiveness. In this way, it will be possible to know what is needed to improve from previous events, since it is not the same to make a decision blindly or guided by instinct as one taken from data obtained from the previous operation.

To carry out operations.

On the other hand, the data can be used in an application that works automatically in the performance of operations and in which, based on previous situations, it makes the decision itself or in the visualization step, it can be used to that a person look at them and make decisions from them.

Similarly, the hypotheses or theories raised by companies in their business area are validated with the results of the more or less intelligent analysis of the data they already possessed or are beginning to process thanks to data engineering.

Uses and tools

The most common uses are log analysis, e-commerce personalization or recommendation engines, fraud detection and financial reports, among many others.


Moreover, if we refer to tools for data analysis, some depend on the type of analysis needed. The best known are the Apache frameworks for big data, or they can be used on AWS in the EMR service.

Machine Learning

In data analysis, there is also what is known as machine learning techniques, which allow a “machine” to learn from the past data for the analysis of current information.


For example, being a company dedicated to electronic commerce, a machine learning model can be trained so that, given a transaction, it says whether it is fraud or not.


This model, previously trained with the historical transaction data of the business and the more data from the past it has, the more effective it is and, in turn, it learns the more it is used.


Social Media:

LinkedIn: https://www.linkedin.com/company/dinocloud
Twitter: https://twitter.com/dinocloud_
Instagram: @dinocloud_
Youtube: https://www.youtube.com/c/DinoCloudConsulting